Find duplicate files

fdf - a pretty clever Perl script to find duplicate files across directories.

It is very simple and very fast. It doesn't use any checksum. It reads (through the File::Compare module) chunks of data from two files. If the chucks are not the same, we already know that the files can't be duplicates of each other.

This is one reason why it is fast, but the other reason is that, if we have 3 duplicate files, let's say: "A", "B" and "C", we compare "A" with "B" and they are the same, then we compare "A" with "C", and are the same, we don't have to compare "B" with "C" because they are duplicates of "A", which tells us that they must be exactly the same.

Yet another reason: files are grouped by file size. So we compare two files only if they have the same size. (Getting the size of a file it's a very simple and fast check).

The script accepts one argument: either '-f' or '-l'.

-f : keep only the first duplicated file
-l : keep only the last duplicated file

If no argument is specified, the script just prints the duplicate files to the standard output, without deleting them.

The duplicate files are grouped together. Each file in the group is exactly the same with any other file from the same group.

Perl: https://github.com/trizen/perl-scripts/blob/master/Finders/fdf
Sidef: https://github.com/trizen/sidef/blob/master/scripts/Applications/fdf.sf

Mathematics and computer science

Find duplicate files

Comments

Post a Comment