Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ for diffing sets

Tags:

c#

linq

I have the following arrays:

var original= new int[] { 2, 1, 3 };
var target = new int[] { 1, 3, 4 };
enum Operation {Added,Removed}

I would like to execute a LINQ query that would return the following:

{{2,Removed},{4,Added}}

Limitation: I would like LINQ to perform this very efficiently and avoid and O(n^2) style algorithms.

like image 238
Sam Saffron Avatar asked Feb 02 '26 00:02

Sam Saffron


2 Answers

Perhaps a LINQ solution is not the best option in this case.

This will produce a dictionary with the result that you want.

Dictionary<int, Operation> difference = new Dictionary<int,Operation>();
foreach (int value in original) {
    difference.Add(value, Operation.Removed);
}
foreach (int value in target) {
    if (difference.ContainsKey(value)) {
        difference.Remove(value);
    } else {
        difference.Add(value, Operation.Added);
    }
}

To keep the size of the dictionary down, perhaps it's possible to loop the enumerations in parallell. I'll have a look at that...

Edit:
Here it is:

Dictionary<int, Operation> difference = new Dictionary<int,Operation>();
IEnumerator<int> o = ((IEnumerable<int>)original).GetEnumerator();
IEnumerator<int> t = ((IEnumerable<int>)target).GetEnumerator();
bool oActive=true, tActive=true;
while (oActive || tActive) {
    if (oActive && (oActive = o.MoveNext())) {
        if (difference.ContainsKey(o.Current)) {
            difference.Remove(o.Current);
        } else {
            difference.Add(o.Current, Operation.Removed);
        }
    }
    if (tActive && (tActive = t.MoveNext())) {
        if (difference.ContainsKey(t.Current)) {
            difference.Remove(t.Current);
        } else {
            difference.Add(t.Current, Operation.Added);
        }
    }
}

Edit2:
I did some performance testing. The first version runs 10%-20% faster, both with sorted lists and randomly ordered lists.

I made lists with numbers from 1 to 100000, randomly skipping 10% of the numbers. On my machine the first version of the code matches the lists in about 16 ms.

like image 198
Guffa Avatar answered Feb 03 '26 13:02

Guffa


enum Operation { Added, Removed, }

static void Main(string[] args)
{
    var original = new int[] { 2, 1, 3 };
    var target = new int[] { 1, 3, 4 };

    var result = original.Except(target)
        .Select(i => new { Value = i, Operation = Operation.Removed, })
        .Concat(
            target.Except(original)
            .Select(i => new { Value = i, Operation = Operation.Added, })
            );

    foreach (var item in result)
        Console.WriteLine("{0}, {1}", item.Value, item.Operation);
}

I don't think you can do this with LINQ using only a single pass given the stock LINQ extension methods but but might be able to code a custom extension method that will. Your trade off will likely be the loss of deferred execution. It would be interesting to compare the relative performance of both.

like image 36
andleer Avatar answered Feb 03 '26 13:02

andleer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!