Split File - Java/Linux

Question

I have a large file contains nearly 250 million characters. Now, I want to split it into parts of each contains 30 million characters ( so first 8 parts will contains 30 million and last part will contain 10 million character). Another point is that I want to include last 1000 characters of each file at the beginning of the next part (means part 1's last 1000 characters append in 2nd part's begining - so, 2nd part contains 30 million 1000 characters and so on). Can anybody help me how to do it programmaticaly (using Java) or using Linux commands (in a fast way).

Roger Lindsjö · Accepted Answer

One way is to use regular unix commands to split the file and the prepend the last 1000 bytes from the previous file.

First split the file:

split -b 30000000 inputfile part.

Then, for each part (ignoring the farst make a new file starting with the last 1000 bytes from the previous:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c 1000 ${prev} > part.temp
    cat ${i} >> part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

Before assembling we again iterate over the files, ignoring the first and throw away the first 1000 bytes:

unset prev
for i in part.*
do if [ -n "${prev}" ]
  then 
    tail -c +1001 ${i} > part.temp
    mv part.temp ${i}
  fi
  prev=${i}
done

Last step is to reassemble the files:

cat part.* >> newfile

Since there was no explanation of why the overlap was needed I just created it and then threw it away.

Split File - Java/Linux

Tags:

java

linux

Arpssss

1 Answers

Roger Lindsjö

Recent Activity

Donate For Us

Split File - Java/Linux

Tags:

java

linux

Arpssss

1 Answers

Roger Lindsjö

Related questions

Recent Activity

Donate For Us