Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split File - Java/Linux

Tags:

java

linux

I have a large file contains nearly 250 million characters. Now, I want to split it into parts of each contains 30 million characters ( so first 8 parts will contains 30 million and last part will contain 10 million character). Another point is that I want to include last 1000 characters of each file at the beginning of the next part (means part 1's last 1000 characters append in 2nd part's begining - so, 2nd part contains 30 million 1000 characters and so on). Can anybody help me how to do it programmaticaly (using Java) or using Linux commands (in a fast way).

like image 580
Arpssss Avatar asked Dec 05 '25 20:12

Arpssss


1 Answers

One way is to use regular unix commands to split the file and the prepend the last 1000 bytes from the previous file.

First split the file:

split -b 30000000 inputfile part.

Then, for each part (ignoring the farst make a new file starting with the last 1000 bytes from the previous:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c 1000 ${prev} > part.temp
    cat ${i} >> part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

Before assembling we again iterate over the files, ignoring the first and throw away the first 1000 bytes:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c +1001 ${i} > part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

Last step is to reassemble the files:

cat part.* >> newfile

Since there was no explanation of why the overlap was needed I just created it and then threw it away.

like image 168
Roger Lindsjö Avatar answered Dec 08 '25 08:12

Roger Lindsjö



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!