I have a very large .tar.gz file which I can't extract all together because of lack of space. I would like to extract half of its contents, process them, and then extract the remaining half.
The archive contains several subdirectories, which in turn contain files. When I extract a subdirectory, I need all its contents to be extracted with it.
What's the best way of doing this in bash? Does tar already allow this?
You can also extract one by one using
tar zxvf file.tar.gz PATH/to/file/inside_archive -C DESTINATION/dir
You can include a script around this:
1) Keep the PATH and DESTINATION same (yes you can use your own base directory for DESTINATION)
2) You can get the path for a file inside archive using
tar -ztvf file.tar.gz
3) You can use a for loop like for files in $(tar -ztvf file.tar.gz | awk '{print $NF}') and define a break condition as per requirement.
I would have done something like:
#!/bin/bash
for files in $(tar -ztvf file.tar.gz| awk '{print $NF}')
do 
subDir=$(dirname $files)
echo $subDir     
tar -C ./My_localDir/${subDir} -zxvf file.tar.gz $files 
done
$subDir contains the name of the sub Directories
Add a break condition to above according to your requirement.
You can for example extract only files which match some pattern:
tar -xvzf largefile.tar.gz --wildcards --no-anchored '*.html'
So, depending on the largefile.tar structure one can extract files with one pattern -> process them -> after that delete files -> extract files with another pattern, and so on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With