Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using buffredReader read big files in java

Tags:

java

I understand there are two ways read big text files in java. One is using scanner and one is using bufferedreader.

Scanner reader = new Scanner(new FileInputStream(path));
while (reader.hasNextLine()){
    String tempString = reader.nextLine();
    System.out.println(java.lang.Runtime.getRuntime().totalMemory()/(1024*1024.0));
}

And the number to be printed is always stable around some value.

However when I use bufferedReader as per edit below the number is not stable, it may increase in a sudden (about 20mb) in one line and then remain the same for many lines(like 8000 lines). And the process repeats. Anyone knows why?

UPDATE I typed the second method using BufferedReader wrong here is what it should be

BufferedReader reader = new BufferedReader
    (new InputStreamReader(new FileInputStream(path)),5*1024*1024);
for(String s = null;(s=reader.readLine())!=null; ){
    System.out.println(java.lang.Runtime.getRuntime().totalMemory()/(1024*1024.0));
}

or using while loop

String s;
while ((s=reader.readLine())!=null ){
    System.out.println(java.lang.Runtime.getRuntime().totalMemory()/(1024*1024.0));
}

To be more specific, here is a result of test case reading 250M file

Scanner case:

linenumber---totolmemory
5000---117.0
10000---112.5
15000---109.5
20000---109.5
25000---109.5
30000---109.5
35000---109.5
40000---109.5
45000---109.5
50000---109.5

BufferedReader case:

linenumber---totolmemory
5000---123.0
10000---155.5
15000---155.5
20000---220.5
25000---220.5
30000---220.5
35000---220.5
40000---220.5
45000---220.5
50000---211.0

However the scanner is slow and that's why I try to avoid it.

And I check the bufferedReader case the total memory increases suddenly in a single random line.

like image 882
Zheyu Ji Avatar asked Apr 09 '26 14:04

Zheyu Ji


1 Answers

Just by itself, a Scanner is not particularly good for big text files.

Scanner and BufferedReader are not comparable. You can use a BufferedInputStream in a Scanner - then you'll have the same thing, with the Scanner adding a lot more of "stream" reading functionality than just lines.

Looking at totalMemory isn't particularly useful. To cite Javadoc: Returns the total amount of memory in the Java virtual machine. The value returned by this method may vary over time, depending on the host environment.

Try freeMemory, which is a little more interesting, reflecting the phases of GC that occur every now and then.

Later Comment on Scanner being slow: Reading a line merely requires scanning bytes for the line separator, and that's how the BufferedReader does it. The Scanner, however, cranks up java.util.regex.Matcher for this task (as it fits better into its overall design). Using the Scanner just for reading lines is breaking butterflies on the wheel.

like image 164
laune Avatar answered Apr 12 '26 02:04

laune



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!