I want to make sure that C# string does not contain specific characters.
I am using string.IndexOfAny(char[]), in my opinion Regex would be slower in this task. Is there a better way to accomplish this? Speed is critical in my application.
Ran a quick benchmark on IndexOf vs IndexOfAny vs Regex vs Hashset.
500 word lorem ipsum haystack, with two character needle.
Tested with both needles in haystack, one in haystack, neither in haystack.
private long TestIndexOf(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
int x = haystack.IndexOfAny(needles);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestRegex(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
Regex regex = new Regex(string.Join("|", needles));
for (int i = 0; i < 1000000; i++)
{
Match m = regex.Match(haystack);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestIndexOf(string haystack, char[] needles)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
int x = haystack.IndexOf(needles[0]);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
private long TestHashset(string haystack, char[] needles)
{
HashSet<char> specificChars = new HashSet<char>(needles.ToList());
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
for (int i = 0; i < 1000000; i++)
{
bool notContainsSpecificChars = !haystack.Any(specificChars.Contains);
}
sw.Stop();
return sw.ElapsedMilliseconds;
}
Result for 1,000,000 iterations:
Index of : 28/2718/2711
Index of Any : 153/141/17561
Regex of : 1068/1102/92324
Hashset : 939/891/111702
Notes:
Overall, regex is slower then indexofany by upto 10x depending on the haystack and needle sizes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With