Problem
Both “a.txt” and “b.txt” are text files that contain a list of words. Now I’d like to see which terms are in “a.txt” that aren’t in “b.txt.”
Because I need to compare two dictionaries, I need a fast algorithm.
Asked by Ali Imran
Solution #1
Try this if you have vim installed:
vimdiff file1 file2
or
vim -d file1 file2
It’ll be excellent for you.
Answered by Fengya Li
Solution #2
Sort them and use the comm: command to communicate with them.
comm -23 <(sort a.txt) <(sort b.txt)
comm compares (sorted) input files and outputs three columns by default: lines unique to a, lines unique to b, and lines that appear in both. You can disable the corresponding output by supplying -1, -2, and/or -3. As a result, comm -23 a b only displays entries that are unique to a. I sort the files on the fly with the (…) syntax; if they are already sorted, you don’t need this.
Answered by Anders Johansson
Solution #3
You can use git diff with the —no-index flag to compare files that aren’t in a git repository if you like the diff output style:
git diff --no-index a.txt b.txt
I benchmarked (with the built-in timecommand) this way against some of the other responses here, using a handful of files with roughly 200k file name strings in each:
git diff --no-index a.txt b.txt
# ~1.2s
comm -23 <(sort a.txt) <(sort b.txt)
# ~0.2s
diff a.txt b.txt
# ~2.6s
sdiff a.txt b.txt
# ~2.7s
vimdiff a.txt b.txt
# ~3.2s
comm looks to be the fastest by a long shot, whereas git diff —no-index appears to be the quickest way to generate diff-style output.
Updated on March 25, 2018 Unless you’re inside a git repository and want to compare untracked files within that repository, you can skip the —no-index switch. The following is taken from the man pages:
Answered by joelostblom
Solution #4
Try the sdiff command (man sdiff)
sdiff -s file1 file2
Answered by mudrii
Solution #5
To compare two files in Linux, use the diff program. To filter required data, use the —changed-group-format and —unchanged-group-format options.
The three options below can be used to select the appropriate group for each option:
[root@vmoracle11 tmp]# cat file1.txt
test one
test two
test three
test four
test eight
[root@vmoracle11 tmp]# cat file2.txt
test one
test three
test nine
[root@vmoracle11 tmp]# diff --changed-group-format='%<' --unchanged-group-format='' file1.txt file2.txt
test two
test four
test eight
Answered by Manjula
Post is based on https://stackoverflow.com/questions/14500787/comparing-two-files-in-linux-terminal