regex - Remove duplicates from text file based on second text file -
how can remove lines text file (main.txt
) checking second textfile (removethese.txt
). efficient approach if files greater 10-100mb. [using mac]
example:
main.txt 3 1 2 5
remove these lines
removethese.txt 3 2 9
output:
output.txt 1 5
example lines (these actual lines i'm working - order not matter):
chijw3p7xz8yyikrbd_tjkgjrs0 chij08x-0kmayikr5ccrf-xt6za chijixbjoykfyikrzugzz6tio1u chijiaf4aooeyikr2c9wyapwdxm chij39hopkdix4krcfdirxivrqs chijk5nev8chyikrihmxier5ak8 chijs9inbrcfyikrf0zlka1njeg chijrycysg0cyikrarqactwz-e8 chijc8haxludyikrfsfjoqwe698 chijxrvp80zpcearavmzvlcwa24 chijw8_laaeeyikr68nb8cpalsu chijs35yqobit4kr05f4cxshd_8 chijormgsdwgyikrvlbhoe7xahq chijattwbawyvogrcppdyk42-nc chijtujgaqunvogr90kc8hriw8c chijn7p2nf8evigrwxdzecjl5eq chijizgc0lsbvigrdlis85m5dbs chijc8h6zqccvigr7u5aefjxjjc chij6ymovoeyvogrjjcmcl6oqco chij54hccsaevogriy9___rgz6o chijif92qn2yvogr87n0-9r5tla chij0t5e1yayvogrifrl7s_oem8 chijwwgce4eyvogrcrfc5pvznd4
there 2 standard ways this:
with grep
:
grep -vxff removethese main
this uses:
-v
invert match.-x
match whole line, prevent, example,he
match lineshello
orhighway hell
.-f
use fixed strings, parameter taken is, not interpreted regular expression.-f
patterns file. in case,removethese
.
with awk
:
$ awk 'fnr==nr {a[$0];next} !($0 in a)' removethese main 1 5
like store every line in removethese
in array a[]
. then, read main
file , print lines not present in array.
Comments
Post a Comment