regex - Remove duplicates from text file based on second text file -
how can remove lines text file (main.txt) checking second textfile (removethese.txt). efficient approach if files greater 10-100mb. [using mac]
example:
main.txt 3 1 2 5 remove these lines
removethese.txt 3 2 9 output:
output.txt 1 5 example lines (these actual lines i'm working - order not matter):
chijw3p7xz8yyikrbd_tjkgjrs0 chij08x-0kmayikr5ccrf-xt6za chijixbjoykfyikrzugzz6tio1u chijiaf4aooeyikr2c9wyapwdxm chij39hopkdix4krcfdirxivrqs chijk5nev8chyikrihmxier5ak8 chijs9inbrcfyikrf0zlka1njeg chijrycysg0cyikrarqactwz-e8 chijc8haxludyikrfsfjoqwe698 chijxrvp80zpcearavmzvlcwa24 chijw8_laaeeyikr68nb8cpalsu chijs35yqobit4kr05f4cxshd_8 chijormgsdwgyikrvlbhoe7xahq chijattwbawyvogrcppdyk42-nc chijtujgaqunvogr90kc8hriw8c chijn7p2nf8evigrwxdzecjl5eq chijizgc0lsbvigrdlis85m5dbs chijc8h6zqccvigr7u5aefjxjjc chij6ymovoeyvogrjjcmcl6oqco chij54hccsaevogriy9___rgz6o chijif92qn2yvogr87n0-9r5tla chij0t5e1yayvogrifrl7s_oem8 chijwwgce4eyvogrcrfc5pvznd4
there 2 standard ways this:
with grep:
grep -vxff removethese main this uses:
-vinvert match.-xmatch whole line, prevent, example,hematch lineshelloorhighway hell.-fuse fixed strings, parameter taken is, not interpreted regular expression.-fpatterns file. in case,removethese.
with awk:
$ awk 'fnr==nr {a[$0];next} !($0 in a)' removethese main 1 5 like store every line in removethese in array a[]. then, read main file , print lines not present in array.
Comments
Post a Comment