Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

splitting a text into words using bufferReader

I have an issue solving a problem. I have to add ONLY words into a treeset (and output the size of treeset) using bufferedReader but the problem is I cannot pass the compilator speed test limit. The text contains only letters and whitespaces (it can be an empty line). I have to find out a new solution but seems not this :

BufferedReader read = new BufferedReader(new InputStreamReader(System.in));
Set<String> text = new TreeSet<String>();
String words[], line;
while ((line = read.readLine()) != null) {
    words = line.split("\\s+");
    for (int i = 0; i < words.length && words[0].length() > 0; i++) {
        text.add(words[i]);
    }
}
System.out.println(text.size());

Is there any other "split" method to use so the compiler use less "time-thinking"?

like image 980
chris09trt Avatar asked Jun 11 '26 07:06

chris09trt


1 Answers

In line

words = line.split("\\s+");

you split by regex, which is much slower, than splitting by one char (5 times on my machine). Java split String performances

If the words are exactly separated by only one space, then the solution is simple

words = line.split(" ");

just replace with this line and your code will run faster.

If words can be separated by several spaces, then add such a line after the loop

text.remove("");

and still replace your regex split with 1 char split.

public class Test {
    public static void main(String[] args) throws IOException {
        // string contains 1, 2 and two spaces between 1 and 2. text size should be 2
        String txt = "1  2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n" +
            "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1";

        InputStream inpstr = new ByteArrayInputStream(txt.getBytes());

        BufferedReader read = new BufferedReader(new InputStreamReader(inpstr));
        Set<String> text = new TreeSet<>();
        String[] words;
        String line;
        long startTime = System.nanoTime();
        while ((line = read.readLine()) != null) {
            //words = line.split("\\s+"); -- runs 5 times slower
            words = line.split(" ");
            for (int i = 0; i < words.length; i++) {
                text.add(words[i]);
            }
        }
        text.remove("");  // add only if words can be separated with multiple spaces

        long endTime = System.nanoTime();
        System.out.println((endTime - startTime) + " " + text.size());
    }
}

Also you can replace your for loop with

text.addAll(Arrays.asList(words));
like image 173
vszholobov Avatar answered Jun 13 '26 20:06

vszholobov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!