I'm using the following code to discard unsupported physical interfaces / subinterfaces from routers that connects to a big ISP network (by big I mean tens of thousands of routers):
private final static Pattern INTERFACES_TO_FILTER =
Pattern.compile("unrouted VLAN|GigabitEthernet.+-mpls layer|FastEthernet.+-802\\.1Q vLAN subif");
// Simplification
List<String> interfaces;
// lots of irrelevant code to query the routers
for (String intf : interfaces) {
if (INTERFACES_TO_FILTER.matcher(intf).find()) {
// code to prevent the interface from being used
}
}
The idea is discarding entries such as:
This code is hit often enough (several times per minute) over huge sets of interfaces (some routers have 50k+ subintefaces), cache doesn't really help much either because new subinterfaces are being configured / discarded very often. The plan is to optimize the regex so that the procedure completes a tad faster (every nanosecond counts). Can you guys enlighten me?
Note: mpls layer and 802.1Q are supported for other kinds of interfaces, unrouted VLANs isn't.
There are some string search algorithms that allow you to try to search in a string of length n for k strings at once cheaper than the obvious O(n*k) cost.
They usually compare a rolling hash against a list of existing hashes of your words. A prime example of this would be the Rabin-Karp algorithm. The wiki page even has a section about this. There are more advanced versions of the principle out there as well, but it's easy to understand the principle.
No idea if there already are libraries in Java that do this (I'd think so), but that's what I'd try - although 5 strings is rather small here (and different size makes it more complex too). So better check whether a good KMP string search isn't faster - I'd think that'd be by far the best solution really (the default java api uses a naive string search, so use a lib)
About your regexes: backtracking regex implementation for performance critical search code? I doubt that's a good idea.
PS: If you'd post a testset and a test harness for your problem, chances are good people would see how much they could beat the favorite - has worked before.. human nature is so easy to trick :)
I'm answering my own question for further reference, although the credits goes to @piotrekkr since he was the one that pointed the way. Also my Kudos to @JB and @ratchet. I ended up using matches(), and the logic using indexOf and several contains was almost as fast (that's news to me, I always assumed that a single regex would be faster than several calls to contains).
Here's a solution that is several times faster (according to the profiler, about 7 times less time is spent at Matcher class methods):
^(?:unrouted VLAN.++|GigabitEthernet.+?-mpls layer|FastEthernet.+?-802\\.1Q vLAN subif)$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With