Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance intensive string splitting and manipulation in java

What is the most efficient way to split a string by a very simple separator?

Some background:

I am porting a function I wrote in C with a bunch of pointer arithmetic to java and it is incredibly slow(After some optimisation still 5* slower). Having profiled it, it turns out a lot of that overhead is in String.split

The function in question takes a host name or ip address and makes it generic:

123.123.123.123->*.123.123.123

a.b.c.example.com->*.example.com

This can be run over several million items on a regular basis, so performance is an issue.

Edit: the rules for converting are thus:

  • If it's an ip address, replace the first part
  • Otherwise, find the main domain name, and make the preceding part generic.

foo.bar.com-> *.bar.com foo.bar.co.uk-> *.bar.co.uk

I have now rewritten using lastIndexOf and substring to work myself in from the back and the performance has improved by leaps and bounds.

I'll leave the question open for another 24 hours before settling on the best answer for future reference

Here's what I've come up with now(the ip part is an insignificant check before calling this function)

private static String hostConvert(String in) {
    final String [] subs = { "ac", "co", "com", "or", "org", "ne", "net", "ad", "gov", "ed" };

    int dotPos = in.lastIndexOf('.');
    if(dotPos == -1)
        return in;
    int prevDotPos = in.lastIndexOf('.', dotPos-1);
    if(prevDotPos == -1)
        return in;
    CharSequence cs = in.subSequence(prevDotPos+1, dotPos);
    for(String cur : subs) {
        if(cur.contentEquals(cs)) {
            int start = in.lastIndexOf('.', prevDotPos-1);
            if(start == -1 || start == 0)
                return in;
            return "*" + in.substring(start);
        }
    }

    return "*" + in.substring(prevDotPos);
}

If there's any space for further improvement it would be good to hear.

like image 638
juhanic Avatar asked Dec 17 '25 18:12

juhanic


1 Answers

Something like this is about as fast as you can make it:

static String starOutFirst(String s) {
    final int K = s.indexOf('.');
    return "*" + s.substring(K);
}
static String starOutButLastTwo(String s) {
    final int K = s.lastIndexOf('.', s.lastIndexOf('.') - 1);
    return "*" + s.substring(K);
}

Then you can do:

    System.out.println(starOutFirst("123.123.123.123"));
    // prints "*.123.123.123"

    System.out.println(starOutButLastTwo("a.b.c.example.com"));
    // prints "*.example.com"

You may need to use regex to see which of the two method is applicable for any given string.

like image 62
polygenelubricants Avatar answered Dec 20 '25 08:12

polygenelubricants



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!