Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep only one version of each file (bash)

Tags:

bash

I want to remove redundant files in a folder. Something like

cat_1.jpg
cat_2.jpg
cat_3.jpg
dog_10.jpg
dog_100.jpg

reduced to

cat_3.jpg
dog_100.jpg

That is, take only the version of each file with the highest number suffix and delete the rest.

This is very much like

list the files with minimum sequence

but the bash answer there has a "for ... in ... ". I have thousands of file names.

EDIT:

Got the file name convention wrong. There may be other underscores (ex. cat_and_dog_100.jpg). I need it to only take the number after the last underscore.

like image 951
Tristan Klassen Avatar asked Dec 18 '25 15:12

Tristan Klassen


1 Answers

Assuming your filenames are always in the form <name>_<numbers>.jpg, here's a quick hack:

while read filename; do
    prefix=${filename/%_*/}  # Get text before underscore
    if [ "$prev_prefix" != "$prefix" ]; then  # we see a new prefix
        echo "Keeping filename"
        prev_prefix=$prefix
    else  # same prefix
        echo "Deleting $filename"
        rm $filename
    fi
done < <(find . -maxdepth 1 -name "*.jpg"| sort -n -t'_' -k1,2)

How this works:

  1. Sorts all *.jpg files first by <name> and then by <numbers>.
    • all files with the same prefix will be grouped together with the highest <number> appearing first
  2. Iterates through the list of filenames and delete files except when a new <name> is found (which should be the one with the highest <number> )

Note that find is used instead of ls *.jpg so we can better handle large number of files.


Disclaimer: This is a rather fragile way of dealing with files and versioning, and should not be adopted as a long term solution. Do heed the comments posted on the question.

like image 69
Shawn Chin Avatar answered Dec 21 '25 05:12

Shawn Chin