Coder Perfect

Using linux termina to compare two files


Both “a.txt” and “b.txt” are text files that contain a list of words. Now I’d like to see which terms are in “a.txt” that aren’t in “b.txt.”

Because I need to compare two dictionaries, I need a fast algorithm.

Asked by Ali Imran

Solution #1

Try this if you have vim installed:

vimdiff file1 file2


vim -d file1 file2

It’ll be excellent for you.

Answered by Fengya Li

Solution #2

Sort them and use the comm: command to communicate with them.

comm -23 <(sort a.txt) <(sort b.txt)

comm compares (sorted) input files and outputs three columns by default: lines unique to a, lines unique to b, and lines that appear in both. You can disable the corresponding output by supplying -1, -2, and/or -3. As a result, comm -23 a b only displays entries that are unique to a. I sort the files on the fly with the (…) syntax; if they are already sorted, you don’t need this.

Answered by Anders Johansson

Solution #3

You can use git diff with the —no-index flag to compare files that aren’t in a git repository if you like the diff output style:

git diff --no-index a.txt b.txt

I benchmarked (with the built-in timecommand) this way against some of the other responses here, using a handful of files with roughly 200k file name strings in each:

git diff --no-index a.txt b.txt
# ~1.2s

comm -23 <(sort a.txt) <(sort b.txt)
# ~0.2s

diff a.txt b.txt
# ~2.6s

sdiff a.txt b.txt
# ~2.7s

vimdiff a.txt b.txt
# ~3.2s

comm looks to be the fastest by a long shot, whereas git diff —no-index appears to be the quickest way to generate diff-style output.

Updated on March 25, 2018 Unless you’re inside a git repository and want to compare untracked files within that repository, you can skip the —no-index switch. The following is taken from the man pages:

Answered by joelostblom

Solution #4

Try the sdiff command (man sdiff)

sdiff -s file1 file2

Answered by mudrii

Solution #5

To compare two files in Linux, use the diff program. To filter required data, use the —changed-group-format and —unchanged-group-format options.

The three options below can be used to select the appropriate group for each option:

[root@vmoracle11 tmp]# cat file1.txt 
test one
test two
test three
test four
test eight
[root@vmoracle11 tmp]# cat file2.txt 
test one
test three
test nine
[root@vmoracle11 tmp]# diff --changed-group-format='%<' --unchanged-group-format='' file1.txt file2.txt 
test two
test four
test eight

Answered by Manjula

Post is based on