I am looking for an efficient search algorithm, that, for a given set of strings searches a large buffer for any one match from the set of strings. Currently i know a few efficient single-string algorithms (i have used the Knuth before), but i don't know if they really help.
Here is what i am actually doing:
I have looked for multiple-string searching algorithms using a finite set of predefined patterns, but they all seem to revolve around matching ALL of the predefined strings in the buffer.
This post: Fast algorithm for searching for substrings in a string, suggested using the Aho–Corasick or the Rabin–Karp alogirthm.
I thought, that since i only need one match, i could find other methods, that are similar to the mentioned algorithms, but the constrains given by the problem can improve the performance.
Aho-Corasick is a good choice here. After building an automaton the input string is traversed from left to right so it is possible to stop immediately after the first match is found. The time complexity is O(sum of lengths of all patterns + the position of the first occurrence). It is optimal because it is not possible to find the first match without reading all patterns and all the bytes from the buffer before the first occurrence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With