Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String.split() will keep original char array inside

I've noticed that Java String will reuse char array inside it to avoid creating new char array for a new String instance in method such as subString(). There are several unpublish constructors in String for this purpose, accepting a char array and two int as range to construct a String instance.

But until today I found that split will also reuse the char arr of original String instance. Now I read a loooooong line from a file, split it with "," and cut a very limit column for real usage. Because every part of the line secretly holding the reference of the looooong char array, I got an OOO very soon.

here is example code:

ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
        "G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
    String name = line.split(',')[0];
    test.add(name);
    i++;
    if (i % 100000 == 0) {
        System.out.println(name);
    }
}
System.out.println(test.size());

Is there any standard method in JDK to make sure that every String instance that spitted is a "real deep copy" not "shallow copy"?

Now I am using a very ugly workaround to force creating a new String instance:

ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
        "G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
    String name = line.split(',')[0]+"  ".trim(); // force creating a String instance
    test.add(name);
    i++;
    if (i % 100000 == 0) {
        System.out.println(name);
    }
}
System.out.println(test.size());
like image 369
DeepNightTwo Avatar asked Jan 29 '26 11:01

DeepNightTwo


2 Answers

The simplest approach is to create a new String directly. This is one of the rare cases where its a good idea.

String name = new String(line.split(",")[0]); // note the use of ","

An alternative is to parse the file yourself.

do {
    StringBuilder name = new StringBuilder();
    int ch;
    while((ch = origReader.read()) >= 0 && ch != ',' && ch >= ' ') {
       name.append((char) ch);
    }
    test.add(name.toString());
} while(origReader.readLine() != null);
like image 53
Peter Lawrey Avatar answered Jan 31 '26 01:01

Peter Lawrey


String has a copy constructor you can use for this purpose.

final String name = new String(line.substring(0, line.indexOf(',')));

... or, as Peter suggested, just only read until the ,.

final StringBuilder buf = new StringBuilder();
do {
  int ch;
  while ((ch = origReader.read()) >= 0 && ch != ',') {
    buf.append((char) ch);
  }
  test.add(buf.toString());
  buf.setLength(0);
} while (origReader.readLine() != null);
like image 39
obataku Avatar answered Jan 31 '26 00:01

obataku



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!