Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Random File Access: Get byte offset of line start

I need to randomly access specific records in a text (ASCII) file and then read from there until a specific "stop sequence" (record delimiter) is found. The file contains multi-line records and each record is separated by the delimiter. Each record also takes a different amount of lines! This is a commonly known file format in the specific area of expertise and can not be changed.

I want to index the file so I can quickly jump to a requested record.

In similar questions like

How to Access string in file by position in Java

and links in it, answer always reference the seek() method of various classes like RandomAccessFile. I know about that!

The issue I have is how to get the offset needed for seek! (indexing the file)

BufferedReader does not have a getFilePointer() method or any other to get the current byte offset from start of file. RandomAccessFile has a readLine() method but it's performance is beyond terrible. It's not usable at all for my case.

I would need to read the file line by line and each time the record delimiter is found I need to get the byte offset. How can I achieve this?

like image 379
beginner_ Avatar asked Nov 26 '25 06:11

beginner_


1 Answers

You can try to subclass the BufferedReader class to remember the read position. But you won't have the seek functionality.

As you mentioned a record can be multi-line, but all the records are separated by a stop sequence. Given this you can use RandomAccessFile like this:

  1. have a byte buffer byte b[] of let's say 8k in size (this is for performance reasons)

  2. read 8k from the file in this buffer and try to find the delimiter, if not found, read another block of 8k, but previously append the data to some StringBuilder or other structure.

  3. when you found the delimiter the position of the delimiter is given by the number of bytes processed since the last delimiter found (you need to do some simple math).

The tricky part will be if the record delimiter is longer that 1 char, but that should be a big problem.

like image 76
Claudiu Avatar answered Nov 28 '25 18:11

Claudiu