Coder Perfect

du counting hardlinks towards filesize?


I have a backup system that produces directories named after Unix timestamps, then creates incremental backups using a hardlink system (—link-dest in rsync), so the first backup is often very large, and subsequent backups are fractions of that size.

This is the result of my most recent backups:

root@athos:/media/awesomeness_drive# du -sh lantea_home/*
31G lantea_home/1384197192
17M lantea_home/1384205953
17M lantea_home/1384205979
17M lantea_home/1384206056
17M lantea_home/1384206195
17M lantea_home/1384207349
3.1G    lantea_home/1384207678
14M lantea_home/1384208111
14M lantea_home/1384208128
16M lantea_home/1384232401
15G lantea_home/1384275601
43M lantea_home/1384318801

Everything appears to be in order, but consider the last directory, lantea home/1384318801:

root@athos:/media/awesomeness_drive# du -sh lantea_home/1384318801/
28G lantea_home/1384318801/

Why does the second du command perceive the directory to be 28G every time I run it?

Note that the -P and -L options have no effect on the output.

Asked by Dan LaManna

Solution #1

Hardlinks are physical links to the same file (represented by its inode). There is no distinction between the “original” file and a hard link that points to it. Both files are references to this file because they have the same status. When one of them is removed, the other is left intact. Only eliminating the last hardlink will result in the file being deleted and the disk space being freed.

So, if you ask du what it sees in a single directory, it doesn’t care if there are hardlinks to the same contents elsewhere. It just adds the sizes of all the files together. Only the hardlinks within the considered directory are counted once. Du is a genius (not all programs necessarily need to be).

So in effect, directory A might have a du size of 28G, directory B might have a size of 29G, but together they still only occupy 30G and if you ask du of the size of A and B, you will get that number.

Answered by Alfe

Solution #2

And the switch “-l” counts the hardlinks in each subdir as well, so I can see the total size of the backup, not just the increment delta.

Answered by Tobias

Post is based on