Coder Perfect

In Unix/Linux, what’s the quickest way to see if two files have the same contents?


I’ve written a shell script that checks whether two files have the same data. I do this for a lot of files, and the diff function appears to be the bottleneck in my routine.

Here’s the line:

diff -q $dst $new > /dev/null

if ($status) then ...

Could a custom algorithm, rather than the default diff, be used to compare the files more quickly?

Asked by JDS

Solution #1

CMP will, I presume, end at the first byte difference:

cmp --silent $old $new || echo "files are different"

Answered by Alex Howansky

Solution #2

@Alex Howansky used ‘cmp —silent’ for this, which I like. However, I require both a good and negative reaction, so I employ:

cmp --silent file1 file2 && echo '### SUCCESS: Files Are Identical! ###' || echo '### WARNING: Files Are Different! ###'

To check files against a constant file, I can execute this in the terminal or using an ssh.

Answered by pn1 dude

Solution #3

To compare any two files quickly and safely:

if cmp --silent -- "$FILE1" "$FILE2"; then
  echo "files contents are identical"
  echo "files differ"

It’s easy to understand, quick, and works with any file name, including “‘ $ ()

Answered by VasiliNovikov

Solution #4

I can’t add this tidbit in as a corollary because I’m a jerk and don’t have enough reputation points.

However, if you’re going to use cmp (and don’t need/want to be verbose), you can just get the exit status. According to the cmp man page:

As an example, you could do the following:

STATUS="$(cmp --silent $FILE1 $FILE2; echo $?)"  # "$?" gives exit status for each comparison

if [[ $STATUS -ne 0 ]]; then  # if status isn't equal to 0, then execute code

EDIT: Thank you all for your feedback! The test syntax has been modified. If you’re searching for anything close to my answer in terms of readability, style, and syntax, I recommend Vasili’s solution.

Answered by Gregory Martin

Solution #5

Any approach that requires reading both files completely, even if the read was in the past, will be required for files that are not different.

There is no other option. As a result, creating hashes or checksums at some point necessitates reading the entire file. It takes time to process large files.

File metadata retrieval is much faster than reading a large file.

Is there any way to tell if the files are distinct by looking at their metadata? What is the file size? or even the output of the file command, which reads only a piece of the file?

Example code segment for file size:

  ls -l $1 $2 | 
  awk 'NR==1{a=$5} NR==2{b=$5} 
       END{val=(a==b)?0 :1; exit( val) }'

[ $? -eq 0 ] && echo 'same' || echo 'different'  

You’re stuck with complete file reads if the files are the same size.

Answered by jim mcnamara

Post is based on