Coder Perfect

How do I remove lines from another file A that appear in file B?

Problem

I have a big file A (full of emails), with one line for each one. I also have a file B with a different set of emails in it.

Which command would I use to remove all of the addresses from file A that appear in file B?

So, if file A included the following information:

A
B
C

and file B included the following information:

B    
D
E

Then file A should be left with the following:

A
C

Now, I realize this is a common subject, but I was only able to find one command online that gave me an error due to a faulty delimiter.

Any assistance would be much appreciated! Someone will undoubtedly come up with a good one-liner, but I’m no authority on shells.

Asked by slhck

Solution #1

If the files are sorted (as they are in your example), follow these steps:

comm -23 file1 file2

-23 hides lines that appear in both files or just in file 2. If the files aren’t already sorted, run them through sort…

See the man page for further information.

Answered by The Archetypal Paul

Solution #2

grep -Fvxf all-lines grep -Fvxf grep -Fvxf grep -Fvxf grep -Fvxf grep -Fvxf

Example:

cat <<EOF > A
b
1
a
0
01
b
1
EOF

cat <<EOF > B
0
1
EOF

grep -Fvxf B A

Output:

b
a
01
b

Explanation:

Because it is more broad, this method is slower on pre-sorted files than previous methods. If you’re concerned about speed, take a look at: Fast way of finding lines in one file that are not in another?

For in-line operation, here’s a fast bash automation:

remove-lines() (
  remove_lines="$1"
  all_lines="$2"
  tmp_file="$(mktemp)"
  grep -Fvxf "$remove_lines" "$all_lines" > "$tmp_file"
  mv "$tmp_file" "$all_lines"
)

GitHub upstream.

usage:

remove-lines lines-to-remove remove-from-this-file

See also: https://unix.stackexchange.com/questions/28158/is-there-a-tool-to-get-the-lines-in-one-file-that-are-not-in-another

Answered by Ciro Santilli 新疆再教育营六四事件法轮功郝海东

Solution #3

awk saves the day!

This solution does not necessitate the use of sorted inputs. You must first give fileB.

awk 'NR==FNR{a[$0];next} !($0 in a)' fileB fileA

returns

A
C

What is the mechanism behind it?

This can now be used to remove words off the blacklist.

$ awk '...' badwords allwords > goodwords

It can clean numerous lists and create cleaned versions with a minor edit.

$ awk 'NR==FNR{a[$0];next} !($0 in a){print > FILENAME".clean"}' bad file1 file2 file3 ...

Answered by karakfa

Solution #4

Another technique to accomplish the same goal (which also necessitates sorted data):

join -v 1 fileA fileB

If the files are not pre-sorted, Bash will:

join -v 1 <(sort fileA) <(sort fileB)

Answered by Dennis Williamson

Solution #5

Unless your files are organized, you can do this.

diff file-a file-b --new-line-format="" --old-line-format="%L" --unchanged-line-format="" > file-a

—new-line-format is for lines in file b that aren’t in file a —old-.. is for lines in file a that aren’t in file b Lines that are in both are marked with —unchanged-.. The percent L ensures that the line is printed precisely.

man diff

for more details

Answered by aec

Post is based on https://stackoverflow.com/questions/4366533/how-to-remove-the-lines-which-appear-on-file-b-from-another-file-a