I wanted to convert sets of strings to regular expression using java.
I searched many things for it but there was no such satisfying answer available on the internet which resolves my issue. so I prefer to ask here.
First is it possible to convert it if yes, then kindly suggest me the way to get rid of this issue I'm facing?
Let's suppose I have sets of strings
abb
abababb
babb
aabb
bbbbabb
...
and I want to make a regular expression for it such as
(a+b)*abb
how it can be possible?
If you have a collection of strings, and want to build a regex that matches any of those strings, you should build a regex that uses the | OR pattern.
Since the strings could contain regex special characters, they need to be quoted.
To make sure the best string matches, you need to match longest string first. E.g. if aba and abax are both on the list, and text to scan contains abax, we'd want to match on the second string, not the first one.
So, you can do it like this:
public static String toRegex(Iterable<String> strings) {
return StreamSupport.stream(strings.spliterator(), false)
.sorted(Comparator.comparingInt(String::length).reversed())
.map(Pattern::quote)
.collect(Collectors.joining("|"));
}
What you are looking for is a way to infer a regular expression from a set of examples. This is a non-trivial computing problem to solve for the general case. See this post for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With