Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

algorithm: partition 2 arrays into "only in A", "intersection of A & B", & "only in B"

I have a situation where I have two arrays and I need to partition them so that I end up with 3 arrays:

  • elements that are only in A
  • elements that are only in B
  • elements that are in both A and B

Example:

A = [1, 4, 3, 2]
B = [2, 6, 5, 3]
3part(A,B) => [[1,4], [6,5], [2,3]] # the order of the elements in each array doesn't matter

I've come up with a correct solution, but wonder if it could be quicker. It is (in pseudocode):

3part(A,B) =>
    a_only = []
    b_only = []
    intersect = []

    foreach a in A
        if B.contains(a)
            intersect.push(a)
        else
            a_only.push(a)

    foreach b in B
        if not intersect.contains(b)
            b_only.push(b)

    return [a_only, b_only, intersect]

In my case at hand, A & B will each contain up to 5000 complex structures (instead of ints) and it runs in about 1.5 secs. It gets used as part of a user interaction which can happen frequently, so ideally it would take < .5sec .

BTW, is there a name for this operation as a whole, other than "difference-difference-intersection"?

Thanks!

EDIT:

Based on the suggestions to use a hash, my updated code runs in under 40ms :-D

Here is the pseudocode:

(say that "key" is the element that I am using for comparison)

array_to_hash(A, Key)
    h = {}
    foreach a in A
        h[a[Key]] = a
    return h

3part(A,B) =>
    a_only = []
    b_only = []
    intersect = {} // note this is now a hash instead of array

    Ah = array_to_hash(A, 'key')
    Bh = array_to_hash(B, 'key')

    foreach ak in Ah.keys()
        if Bh.hasKey(ak)
            intersect[ak] = Ah[ak]
        else
            a_only.push(Ah[ak])

    foreach bk in Bh.keys()
        if not intersect.hasKey(bk)
            b_only.push(Bh[bk])

    return [a_only, b_only, intersect.values]

Thank you all for the suggestions.

like image 397
dgbillotte Avatar asked Jan 18 '26 10:01

dgbillotte


2 Answers

Your pseudocode is good IF you use a hashed set data structure for all the "arrays".

I guess all current programming environments have decent collections support, including hash-based sets. Here are two examples how to do it in Java, running in more or less O(n+m). In Java, for hashed collections to function properly, it's important that your complex objects implement the hashCode() and the equals() method in a compliant fashion (can often be auto-generated by your IDE).

The first version completely relies on the set-algebra implementation of your library, which should result in O(n) if the library is OK:

private static void test1() {
    Integer[] a = {1, 4, 3, 2};
    Integer[] b = {2, 6, 5, 3};

    Set<Integer> aOnly = new HashSet<>(Arrays.asList(a));
    Set<Integer> bOnly = new HashSet<>(Arrays.asList(b));

    Set<Integer> ab = new HashSet<>(aOnly);
    ab.retainAll(bOnly);
    aOnly.removeAll(ab);
    bOnly.removeAll(ab);

    System.out.println("A only: " + aOnly);
    System.out.println("A and B: " + ab);
    System.out.println("B only: " + bOnly);
}

The second one uses the fact that in Java, the remove() method returns true if the element was present before removing it. If your library doesn't do that you have to

private static void test2() {
    Integer[] a = {1, 4, 3, 2};
    Integer[] b = {2, 6, 5, 3};

    Set<Integer> aOnly = new HashSet<>(Arrays.asList(a));
    Set<Integer> bOnly = new HashSet<>();
    Set<Integer> ab = new HashSet<>();

    for (int bElem : b) {
        if (aOnly.remove(bElem)) {
            ab.add(bElem);
        } else {
            bOnly.add(bElem);
        }
    }

    System.out.println("A only: " + aOnly);
    System.out.println("A and B: " + ab);
    System.out.println("B only: " + bOnly);
}
like image 194
Ralf Kleberhoff Avatar answered Jan 21 '26 02:01

Ralf Kleberhoff


If your arrays are sortable then you could do 2 things

  1. to check if a value is in the other array, simply do a binary search on the other array, Complexity O(nlogm + mlogn)

  2. Or you could merge arrays into the 3 arrays using 2 pointers, since the arrays are sorted, if the first elements are equal add them to the intersection set, incase they are not if element in A < element in B. add the element in A to the array a[] and now check the 2nd element with the first element in B.
    same if B was less than A
    Complexity O(n + m). you can maintain which element we are referring to using 2 pointers

like image 42
marvel308 Avatar answered Jan 21 '26 01:01

marvel308



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!