I have got following file:
file.csv
header:2013/01/01, shasum: 495629218484151218892233214
content:data,a,s,d,f,g,h,j,k,l
content:data,q,w,e,r,t,y,u,i,o,p
content:data,z,x,c,v,b,n,m
footer:2013/01/01 EOF
I need to calculate the hash of content. In other words I need to calculate hash of file contents without header and footer and make sure it matches with the one provided in the header from source. I tried reading file line by line using scanner and leaving out header and footer.
Scanner reader = new Scanner(new FileReader("filename"));
String header = reader.nextLine();
while(reader.hasNextLine()){
line = reader.nextLine();
if(reader.hasNextLine()){
md.update(line.getBytes());
md.update(NEW_LINE.getBytes());
}
}
Here I don't know where file is coming from. It might be coming from Windows or Unix. So how could I know what NEW_LINE to use. For that I have written this dirty hack.
int i;
while((i = br.read()) != -1){
if(i == '\r'){
if(br.read() == '\n'){
NEW_LINE = "\r\n";
break;
}
} else if(i == '\n'){
NEW_LINE = "\n";
break;
}
}
Basically it is looking for the first sequence of either \r\n or \n. Whatever it encounters first, it assumes that to be the newline character.
This will definitely land me in trouble if my file is a mix of both CRLF and LF. I might benefit from a reader to which I can provide two offsets and it gives me back content between those two offsets. Like so:
reader.read(15569, 236952265);
I believe the two offsets that I want can be calculated. Any suggestions from community greatly appreciated.
Better than how I supposed in the comments, we should simply use the RandomAccessFile class!
// Load in the data file in read-only mode:
RandomAccessFile randFile = new RandomAccessFile("inputFileName.txt", "r");
// (On your own): Calculate starting byte to read from
// (On your own): Calculate ending byte to read from
// Discard header and footer.
randFile.setLength(endingPoint);
randFile.seek(startingPoint);
// Discard newlines of any kind as they are read in.
StringBuilder sb = new StringBuilder(endingPoint - startingPoint);
String currentLine = "";
while(currentLine != null)
{
sb.append(currentLine);
currentLine = randFile.readLine();
}
// hash your String contained in your StringBuilder without worrying about
// header, footer or newlines of any kind.
Note this code is not production quality as it does not catch exceptions and may have some off-by-one errors. I highly recommend reading the documentation on the RandomAccessFile class: http://docs.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html#readLine()
I hope this helps. If I am off base, let me know and I'll give it another shot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With