Remove duplicate files from a directory
Recently encountered a server with 8.2 million image files, taking an account over quota.
The files were mostly duplicates and the duplicates seem to have been autogenerated somehow over the course of a year.
Needed to confidently remove only duplicate files, leaving one.
Credit: SiegeX at SuperUser
Bash
#!/bin/bash
declare -A arr
shopt -s globstar
for file in **; do
[[ -f "$file" ]] || continue
read cksm _ < <(md5sum "$file")
if ((arr[$cksm]++)); then
rm $file
fi
done