What is the most efficient way to split a string by a very simple separator?
Some background:
I am porting a function I wrote in C with a bunch of pointer arithmetic to java and it is incredibly slow(After some optimisation still 5* slower). Having profiled it, it turns out a lot of that overhead is in String.split
The function in question takes a host name or ip address and makes it generic:
123.123.123.123->*.123.123.123
a.b.c.example.com->*.example.com
This can be run over several million items on a regular basis, so performance is an issue.
Edit: the rules for converting are thus:
foo.bar.com-> *.bar.com foo.bar.co.uk-> *.bar.co.uk
I have now rewritten using lastIndexOf and substring to work myself in from the back and the performance has improved by leaps and bounds.
I'll leave the question open for another 24 hours before settling on the best answer for future reference
Here's what I've come up with now(the ip part is an insignificant check before calling this function)
private static String hostConvert(String in) {
final String [] subs = { "ac", "co", "com", "or", "org", "ne", "net", "ad", "gov", "ed" };
int dotPos = in.lastIndexOf('.');
if(dotPos == -1)
return in;
int prevDotPos = in.lastIndexOf('.', dotPos-1);
if(prevDotPos == -1)
return in;
CharSequence cs = in.subSequence(prevDotPos+1, dotPos);
for(String cur : subs) {
if(cur.contentEquals(cs)) {
int start = in.lastIndexOf('.', prevDotPos-1);
if(start == -1 || start == 0)
return in;
return "*" + in.substring(start);
}
}
return "*" + in.substring(prevDotPos);
}
If there's any space for further improvement it would be good to hear.
Something like this is about as fast as you can make it:
static String starOutFirst(String s) {
final int K = s.indexOf('.');
return "*" + s.substring(K);
}
static String starOutButLastTwo(String s) {
final int K = s.lastIndexOf('.', s.lastIndexOf('.') - 1);
return "*" + s.substring(K);
}
Then you can do:
System.out.println(starOutFirst("123.123.123.123"));
// prints "*.123.123.123"
System.out.println(starOutButLastTwo("a.b.c.example.com"));
// prints "*.example.com"
You may need to use regex to see which of the two method is applicable for any given string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With