I'm trying to create a custom IEqualityComparer
for an intersect to compare strings using a Regex
, and I really just don't get what it's doing.
Here's the code, from a LinqPad test.
Comparer:
public class RegexEqualityComparer : EqualityComparer<string>
{
public RegexEqualityComparer(string pattern, RegexOptions options = RegexOptions.None)
{
_re = new Regex(pattern, options);
}
public RegexEqualityComparer(Regex re)
{
_re = re;
}
public override bool Equals(string x, string y)
{
bool res = false;
if (Object.ReferenceEquals(x, y))
res = true;
else if (x != null && y != null)
res = _re.IsMatch(x) && _re.IsMatch(y);
String.Format("RES: {0}, {1} = {2}", new object[] { x, y, res }).Dump();
return res;
}
public override int GetHashCode(string obj)
{
return obj.GetHashCode();
}
// ------------------------------------------------------------------------------------------------------------------------------------------------
private Regex _re;
}
Called with:
RegexEqualityComparer comparer = new RegexEqualityComparer(@"^-");
new string[] { "1", "-4" }.Intersect(new string[] { "1", "-" }, comparer).Dump();
I am expecting this to give me { "1", "-4" }
- both elements from set1 appear in set2 - according to the equality comparer; what it actually gives is:
RES: 1, 1 = True
{ "1" }
The thing that really confuses me is that according to the LinqPad Dump()
in the comparer, it never bothers even trying to compare the -4
with anything - the sole dump is that RES: 1, 1 = True
I'm sure I'm missing something obvious here, but can't see it at all!
public override int GetHashCode(string obj)
{
return obj.GetHashCode();
}
If Equals(a, b)
then it is required that GetHashCode(a) == GetHashCode(b)
. This is not guarnateed by your EqualityComparer
, and this bug means the matching values are not found by the Intersect()
call.
A sensible enough implementation to correspond with your Equals
would be:
public override int GetHashCode(string obj)
{
if (obj == null) return 0;
if (_re.IsMatch(obj)) return 1;
return obj.GetHashCode(); // catch the reference-equals part for non-matches.
}
The fact that two strings with the same characters would be considered non-equal (i.e. it considers new string('1', 1)
different to "1"
) is perhaps deliberate or perhaps a bug. Maybe the ReferenceEquals()
should be string's ==
?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With