Skip to content

Finding and deleting duplicate files

I needed to remove all duplicates In a collection of hundreds of thousands of files. I first came across this, which generates a script with commented out rm statements, but I quickly found another tool, fdupes, which made life a lot easier for me; I didn’t want manual control. I just wanted to have all the duplicates deleted, except one of them.

Fdupes has a feature to omit the first file in a set. So I made a simple script which found all duplicates, omit the first of the set, and put rm statements before the file names:

echo "#!/bin/sh" > $OUTF
fdupes -r -f . |sed -r 's/(.+)/rm \1/' >> $OUTF
chmod +x $OUTF

This script in turn generates another script, which can then be executed, but not before checking if it is actually correct, of course.

1 Comment ( Add comment / trackback )

  1. (permalink)
    Comment by Rowan Rodrik
    On December 28, 2008 at 20:43

    Glad you blogged about this. Now I don’t have to anymore. 🙂 Also, the script I mentioned to you seems far inferior to the solution that you found.