Skip to content

# Finding and deleting duplicate files

I needed to remove all duplicates In a collection of hundreds of thousands of files. I first came across this, which generates a script with commented out rm statements, but I quickly found another tool, fdupes, which made life a lot easier for me; I didn’t want manual control. I just wanted to have all the duplicates deleted, except one of them.

Fdupes has a feature to omit the first file in a set. So I made a simple script which found all duplicates, omit the first of the set, and put rm statements before the file names:

#!/bin/sh
OUTF='rm-dups.sh'

echo "#!/bin/sh" > $OUTF fdupes -r -f . |sed -r 's/(.+)/rm \1/' >>$OUTF
chmod +x \$OUTF

This script in turn generates another script, which can then be executed, but not before checking if it is actually correct, of course.

## 1 Comment( Add comment / trackback )

1. Comment by Rowan Rodrik
On December 28, 2008 at 20:43

Glad you blogged about this. Now I don’t have to anymore. 🙂 Also, the script I mentioned to you seems far inferior to the solution that you found.