Finding and deleting duplicate files

I needed to remove all duplicates In a collection of hundreds of thousands of files. I first came across this, which generates a script with commented out rm statements, but I quickly found another tool, fdupes, which made life a lot easier for me; I didn’t want manual control. I just wanted to have all the duplicates deleted, except one of them.

Fdupes has a feature to omit the first file in a set. So I made a simple script which found all duplicates, omit the first of the set, and put rm statements before the file names:

echo "#!/bin/sh" > $OUTF
fdupes -r -f . |sed -r 's/(.+)/rm \1/' >> $OUTF
chmod +x $OUTF

This script in turn generates another script, which can then be executed, but not before checking if it is actually correct, of course.

    Rowan Rodrik
    December 28, 2008

    Glad you blogged about this. Now I don’t have to anymore. 🙂 Also, the script I mentioned to you seems far inferior to the solution that you found.