regex - Remove duplicates from text file based on second text file -


how can remove lines text file (main.txt) checking second textfile (removethese.txt). efficient approach if files greater 10-100mb. [using mac]

example:

main.txt 3 1 2 5 

remove these lines

removethese.txt 3 2 9 

output:

output.txt 1 5 

example lines (these actual lines i'm working - order not matter):

chijw3p7xz8yyikrbd_tjkgjrs0 chij08x-0kmayikr5ccrf-xt6za chijixbjoykfyikrzugzz6tio1u chijiaf4aooeyikr2c9wyapwdxm chij39hopkdix4krcfdirxivrqs chijk5nev8chyikrihmxier5ak8 chijs9inbrcfyikrf0zlka1njeg chijrycysg0cyikrarqactwz-e8 chijc8haxludyikrfsfjoqwe698 chijxrvp80zpcearavmzvlcwa24 chijw8_laaeeyikr68nb8cpalsu chijs35yqobit4kr05f4cxshd_8 chijormgsdwgyikrvlbhoe7xahq chijattwbawyvogrcppdyk42-nc chijtujgaqunvogr90kc8hriw8c chijn7p2nf8evigrwxdzecjl5eq chijizgc0lsbvigrdlis85m5dbs chijc8h6zqccvigr7u5aefjxjjc chij6ymovoeyvogrjjcmcl6oqco chij54hccsaevogriy9___rgz6o chijif92qn2yvogr87n0-9r5tla chij0t5e1yayvogrifrl7s_oem8 chijwwgce4eyvogrcrfc5pvznd4 

there 2 standard ways this:

with grep:

grep -vxff removethese main 

this uses:

  • -v invert match.
  • -x match whole line, prevent, example, he match lines hello or highway hell.
  • -f use fixed strings, parameter taken is, not interpreted regular expression.
  • -f patterns file. in case, removethese.

with awk:

$ awk 'fnr==nr {a[$0];next} !($0 in a)' removethese main 1 5 

like store every line in removethese in array a[]. then, read main file , print lines not present in array.


Comments

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

javascript - cocos2d-js draw circle not instantly -