Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UNIX untar content into multiple folders

I have a tar.gz file about 13GB in size. It contains about 1.2 million documents. When I untar this all these files sit in one single directory & any reads from this directory takes ages. Is there any way I can split the files from the tar into multiple new folders?

e.g.: I would like to create new folders named [1,2,...] each having 1000 files.

like image 258
Srikar Appalaraju Avatar asked Sep 06 '25 16:09

Srikar Appalaraju


1 Answers

This is a quick and dirty solution but it does the job in Bash without using any temporary files.

i=0                                 # file counter
dir=0                               # folder name counter
mkdir $dir                          
tar -tzvf YOURFILE.tar.gz |
cut -d ' ' -f12 |                   # get the filenames contained in the archive
while read filename
    do 
        i=$((i+1))
        if [ $i == 1000 ]           # new folder for every 1000 files
        then
            i=0                     # reset the file counter
            dir=$((dir+1))
            mkdir $dir
        fi
        tar -C $dir -xvzf YOURFILE.tar.gz $filename
    done

Same as a one liner:

i=0; dir=0; mkdir $dir; tar -tzvf YOURFILE.tar.gz | cut -d ' ' -f12 | while read filename; do i=$((i+1)); if [ $i == 1000 ]; then i=0; dir=$((dir+1)); mkdir $dir; fi; tar -C $dir -xvzf YOURFILE.tar.gz $filename; done

Depending on your shell settings the "cut -d ' ' -f12" part for retrieving the last column (filename) of tar's content output could cause a problem and you would have to modify that.

It worked with 1000 files but if you have 1.2 million documents in the archive, consider testing this with something smaller first.

like image 118
lecodesportif Avatar answered Sep 08 '25 11:09

lecodesportif