Remove duplicate files from a directory

Submitted by Calvin on Wed, 01/15/2020 - 12:35
Agent Smith Clones

Recently encountered a server with 8.2 million image files, taking an account over quota.

The files were mostly duplicates and the duplicates seem to have been autogenerated somehow over the course of a year.

Needed to confidently remove only duplicate files, leaving one.

Credit: SiegeX at SuperUser

Bash

#!/bin/bash
declare -A arr
shopt -s globstar

for file in **; do
  [[ -f "$file" ]] || continue

  read cksm _ < <(md5sum "$file")
  if ((arr[$cksm]++)); then 
    rm $file
  fi
done