While working on an issue, I encountered an interesting use case. The StartsWith
method (and maybe some others) returns true for the following 2 cases:
"\u0000Example".StartsWith("Example"); // returns true
"\u0000Example".StartsWith("\u0000Example"); // returns true
Repository with this example and unit tests: https://github.com/DeanMilojevic/UnicodeInvestigation
As I didn't have time to dive deeper into the implementation of the method, I was wondering if this is a "bug" or expected behavior?
Otherwise when I find out more in my free time, will update the question with additional information.
One of the breaking changes introduced in .NET 5 is the transition from NLS to ICU globalization libraries on Windows. I don't see any mention of it in your post but assuming you are using .NET 5 then this behaviour does not appear to be a bug.
To quote the docs:
If you use functions like string.IndexOf(string) without calling the overload that takes a StringComparison argument, you might intend to perform an ordinal search, but instead you inadvertently take a dependency on culture-specific behavior. Since NLS and ICU implement different logic in their linguistic comparers, the results of methods like string.IndexOf(string) can return unexpected values.
If you take a look at this table, you'll see that the string.StartsWith
method uses CurrentCulture
comparison by default when passing a string parameter. Since you're doing a culture-specific comparison (by not specifying the StringComparison
parameter) then it seems that the ICU library, however it goes about its implementation, ignores the null unicode character \u0000
whereas the NLS library seemingly doesn't.
Judging by your code, it looks like what you wanted to do was perform an ordinal search which can be done using: "\u0000Example".StartsWith("Example", StringComparison.Ordinal)
which will correctly return false
.
It is recommended to enable code analyzers in your project to help you identify code that is unexpectedly using a linguistic comparer when an ordinal one was likely intended.
The recommended rules to enable are:
CA1307: Specify StringComparison for clarity
CA1309: Use ordinal StringComparison
CA1310: Specify StringComparison for correctness
To enable these code analysis rules and have them cause build errors, simply add the following to your project file:
<PropertyGroup>
<AnalysisMode>AllEnabledByDefault</AnalysisMode>
<WarningsAsErrors>$(WarningsAsErrors);CA1307;CA1309;CA1310</WarningsAsErrors>
</PropertyGroup>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With