Skip to content

Finding and deleting duplicate files

I needed to remove all duplicates In a collection of hundreds of thousands of files. I first came across this, which generates a script with commented out rm statements, but I quickly found another tool, fdupes, which made life a lot easier for me; I didn’t want manual control. I just wanted to have all the duplicates deleted, except one of them.

Fdupes has a feature to omit the first file in a set. So I made a simple script which found all duplicates, omit the first of the set, and put rm statements before the file names:

#!/bin/sh
OUTF='rm-dups.sh'
 
echo "#!/bin/sh" > $OUTF
fdupes -r -f . |sed -r 's/(.+)/rm \1/' >> $OUTF
chmod +x $OUTF

This script in turn generates another script, which can then be executed, but not before checking if it is actually correct, of course.


    1 Comment ( Add comment / trackback )

    1. (permalink)
      Comment by Rowan Rodrik
      On December 28, 2008 at 20:43

      Glad you blogged about this. Now I don’t have to anymore. 🙂 Also, the script I mentioned to you seems far inferior to the solution that you found.