Problem
I have a big file A (full of emails), with one line for each one. I also have a file B with a different set of emails in it.
Which command would I use to remove all of the addresses from file A that appear in file B?
So, if file A included the following information:
A
B
C
and file B included the following information:
B
D
E
Then file A should be left with the following:
A
C
Now, I realize this is a common subject, but I was only able to find one command online that gave me an error due to a faulty delimiter.
Any assistance would be much appreciated! Someone will undoubtedly come up with a good one-liner, but I’m no authority on shells.
Asked by slhck
Solution #1
If the files are sorted (as they are in your example), follow these steps:
comm -23 file1 file2
-23 hides lines that appear in both files or just in file 2. If the files aren’t already sorted, run them through sort…
See the man page for further information.
Answered by The Archetypal Paul
Solution #2
grep -Fvxf all-lines grep -Fvxf grep -Fvxf grep -Fvxf grep -Fvxf grep -Fvxf
Example:
cat <<EOF > A
b
1
a
0
01
b
1
EOF
cat <<EOF > B
0
1
EOF
grep -Fvxf B A
Output:
b
a
01
b
Explanation:
Because it is more broad, this method is slower on pre-sorted files than previous methods. If you’re concerned about speed, take a look at: Fast way of finding lines in one file that are not in another?
For in-line operation, here’s a fast bash automation:
remove-lines() (
remove_lines="$1"
all_lines="$2"
tmp_file="$(mktemp)"
grep -Fvxf "$remove_lines" "$all_lines" > "$tmp_file"
mv "$tmp_file" "$all_lines"
)
GitHub upstream.
usage:
remove-lines lines-to-remove remove-from-this-file
See also: https://unix.stackexchange.com/questions/28158/is-there-a-tool-to-get-the-lines-in-one-file-that-are-not-in-another
Answered by Ciro Santilli 新疆再教育营六四事件法轮功郝海东
Solution #3
awk saves the day!
This solution does not necessitate the use of sorted inputs. You must first give fileB.
awk 'NR==FNR{a[$0];next} !($0 in a)' fileB fileA
returns
A
C
What is the mechanism behind it?
This can now be used to remove words off the blacklist.
$ awk '...' badwords allwords > goodwords
It can clean numerous lists and create cleaned versions with a minor edit.
$ awk 'NR==FNR{a[$0];next} !($0 in a){print > FILENAME".clean"}' bad file1 file2 file3 ...
Answered by karakfa
Solution #4
Another technique to accomplish the same goal (which also necessitates sorted data):
join -v 1 fileA fileB
If the files are not pre-sorted, Bash will:
join -v 1 <(sort fileA) <(sort fileB)
Answered by Dennis Williamson
Solution #5
Unless your files are organized, you can do this.
diff file-a file-b --new-line-format="" --old-line-format="%L" --unchanged-line-format="" > file-a
—new-line-format is for lines in file b that aren’t in file a —old-.. is for lines in file a that aren’t in file b Lines that are in both are marked with —unchanged-.. The percent L ensures that the line is printed precisely.
man diff
for more details
Answered by aec
Post is based on https://stackoverflow.com/questions/4366533/how-to-remove-the-lines-which-appear-on-file-b-from-another-file-a