Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

IEnumerable.Intersect with custom comparer, don't understand behaviour

Tags:

c#

linq

I'm trying to create a custom IEqualityComparer for an intersect to compare strings using a Regex, and I really just don't get what it's doing.

Here's the code, from a LinqPad test.

Comparer:

public class RegexEqualityComparer : EqualityComparer<string>
{
    public RegexEqualityComparer(string pattern, RegexOptions options = RegexOptions.None)
    {
        _re = new Regex(pattern, options);
    }

    public RegexEqualityComparer(Regex re)
    {
        _re = re;
    }

    public override bool Equals(string x, string y)
    {           
        bool res = false;

        if (Object.ReferenceEquals(x, y)) 
            res = true;
        else if (x != null && y != null)
            res = _re.IsMatch(x) && _re.IsMatch(y);

        String.Format("RES: {0}, {1} = {2}", new object[] { x, y, res }).Dump();

        return res;            
    }

    public override int GetHashCode(string obj)
    {
        return obj.GetHashCode();
    }

    // ------------------------------------------------------------------------------------------------------------------------------------------------ 
    private Regex _re;
}

Called with:

RegexEqualityComparer comparer = new RegexEqualityComparer(@"^-");

new string[] { "1", "-4" }.Intersect(new string[] { "1", "-" }, comparer).Dump();

I am expecting this to give me { "1", "-4" } - both elements from set1 appear in set2 - according to the equality comparer; what it actually gives is:

RES: 1, 1 = True
{ "1" }

The thing that really confuses me is that according to the LinqPad Dump() in the comparer, it never bothers even trying to compare the -4 with anything - the sole dump is that RES: 1, 1 = True

I'm sure I'm missing something obvious here, but can't see it at all!

like image 771
Whelkaholism Avatar asked Oct 20 '25 12:10

Whelkaholism


1 Answers

public override int GetHashCode(string obj)
{
    return obj.GetHashCode();
}

If Equals(a, b) then it is required that GetHashCode(a) == GetHashCode(b). This is not guarnateed by your EqualityComparer, and this bug means the matching values are not found by the Intersect() call.

A sensible enough implementation to correspond with your Equals would be:

public override int GetHashCode(string obj)
{
  if (obj == null) return 0;
  if (_re.IsMatch(obj)) return 1;
  return obj.GetHashCode(); // catch the reference-equals part for non-matches.
}

The fact that two strings with the same characters would be considered non-equal (i.e. it considers new string('1', 1) different to "1") is perhaps deliberate or perhaps a bug. Maybe the ReferenceEquals() should be string's ==?

like image 172
Jon Hanna Avatar answered Oct 23 '25 03:10

Jon Hanna