The following code
string expression = "(\\{[0-9]+\\})";
RegexOptions options = ((RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline) | RegexOptions.IgnoreCase);
Regex tokenParser = new Regex(expression, options);
MatchCollection matches = tokenParser.Matches("The {0} is a {1} and the {2} is also a {1}");
will match and capture "{0}", "{1}", "{2}" and "{1}".
Is it possible to change it (either the regular expression or option of the RegEx) so that it would match and capture "{0}", "{1}" and "{2}". In other words, each match should only be captured once?
Here is what I came up with.
private static bool TokensMatch(string t1, string t2)
{
return TokenString(t1) == TokenString(t2);
}
private static string TokenString(string input)
{
Regex tokenParser = new Regex(@"(\{[0-9]+\})|(\[.*?\])");
string[] tokens = tokenParser.Matches(input).Cast<Match>()
.Select(m => m.Value).Distinct().OrderBy(s => s).ToArray<string>();
return String.Join(String.Empty, tokens);
}
Note that the difference in the regular expression from the one in my question is due to the fact that I cater for two types of token; numbered ones delimited by {} and named ones delimited by [];
Regular expressions solve lots of problems, but not every problem. How about using other tools in the toolbox?
var parameters = new HashSet<string>(
matches.Select(mm => mm.Value).Skip(1));
Or
var parameters = matches.Select(mm => mm.Value).Skip(1).Distinct();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With