I needed to remove all duplicates In a collection of hundreds of thousands of files. I first came across this, which generates a script with commented out rm statements, but I quickly found another tool, fdupes, which made life a lot easier for me; I didn’t want manual control. I just wanted to have all the duplicates deleted, except one of them.
Fdupes has a feature to omit the first file in a set. So I made a simple script which found all duplicates, omit the first of the set, and put rm statements before the file names:
#!/bin/sh OUTF='rm-dups.sh' echo "#!/bin/sh" > $OUTF fdupes -r -f . |sed -r 's/(.+)/rm \1/' >> $OUTF chmod +x $OUTF
This script in turn generates another script, which can then be executed, but not before checking if it is actually correct, of course.
Recent Comments