algorithm: partition 2 arrays into "only in A", "intersection of A & B", & "only in B"

Question

I have a situation where I have two arrays and I need to partition them so that I end up with 3 arrays:

elements that are only in A
elements that are only in B
elements that are in both A and B

Example:

A = [1, 4, 3, 2]
B = [2, 6, 5, 3]
3part(A,B) => [[1,4], [6,5], [2,3]] # the order of the elements in each array doesn't matter

I've come up with a correct solution, but wonder if it could be quicker. It is (in pseudocode):

3part(A,B) =>
    a_only = []
    b_only = []
    intersect = []

    foreach a in A
        if B.contains(a)
            intersect.push(a)
        else
            a_only.push(a)

    foreach b in B
        if not intersect.contains(b)
            b_only.push(b)

    return [a_only, b_only, intersect]

In my case at hand, A & B will each contain up to 5000 complex structures (instead of ints) and it runs in about 1.5 secs. It gets used as part of a user interaction which can happen frequently, so ideally it would take < .5sec .

BTW, is there a name for this operation as a whole, other than "difference-difference-intersection"?

Thanks!

EDIT:

Based on the suggestions to use a hash, my updated code runs in under 40ms :-D

Here is the pseudocode:

(say that "key" is the element that I am using for comparison)

array_to_hash(A, Key)
    h = {}
    foreach a in A
        h[a[Key]] = a
    return h

3part(A,B) =>
    a_only = []
    b_only = []
    intersect = {} // note this is now a hash instead of array

    Ah = array_to_hash(A, 'key')
    Bh = array_to_hash(B, 'key')

    foreach ak in Ah.keys()
        if Bh.hasKey(ak)
            intersect[ak] = Ah[ak]
        else
            a_only.push(Ah[ak])

    foreach bk in Bh.keys()
        if not intersect.hasKey(bk)
            b_only.push(Bh[bk])

    return [a_only, b_only, intersect.values]

Thank you all for the suggestions.

Ralf Kleberhoff · Accepted Answer

Your pseudocode is good IF you use a hashed set data structure for all the "arrays".

I guess all current programming environments have decent collections support, including hash-based sets. Here are two examples how to do it in Java, running in more or less O(n+m). In Java, for hashed collections to function properly, it's important that your complex objects implement the hashCode() and the equals() method in a compliant fashion (can often be auto-generated by your IDE).

The first version completely relies on the set-algebra implementation of your library, which should result in O(n) if the library is OK:

private static void test1() {
    Integer[] a = {1, 4, 3, 2};
    Integer[] b = {2, 6, 5, 3};

    Set<Integer> aOnly = new HashSet<>(Arrays.asList(a));
    Set<Integer> bOnly = new HashSet<>(Arrays.asList(b));

    Set<Integer> ab = new HashSet<>(aOnly);
    ab.retainAll(bOnly);
    aOnly.removeAll(ab);
    bOnly.removeAll(ab);

    System.out.println("A only: " + aOnly);
    System.out.println("A and B: " + ab);
    System.out.println("B only: " + bOnly);
}

The second one uses the fact that in Java, the remove() method returns true if the element was present before removing it. If your library doesn't do that you have to

private static void test2() {
    Integer[] a = {1, 4, 3, 2};
    Integer[] b = {2, 6, 5, 3};

    Set<Integer> aOnly = new HashSet<>(Arrays.asList(a));
    Set<Integer> bOnly = new HashSet<>();
    Set<Integer> ab = new HashSet<>();

    for (int bElem : b) {
        if (aOnly.remove(bElem)) {
            ab.add(bElem);
        } else {
            bOnly.add(bElem);
        }
    }

    System.out.println("A only: " + aOnly);
    System.out.println("A and B: " + ab);
    System.out.println("B only: " + bOnly);
}

marvel308 · Answer

If your arrays are sortable then you could do 2 things

to check if a value is in the other array, simply do a binary search on the other array, Complexity O(nlogm + mlogn)
Or you could merge arrays into the 3 arrays using 2 pointers, since the arrays are sorted, if the first elements are equal add them to the intersection set, incase they are not if element in A < element in B. add the element in A to the array a[] and now check the 2nd element with the first element in B.
same if B was less than A
Complexity O(n + m). you can maintain which element we are referring to using 2 pointers

algorithm: partition 2 arrays into "only in A", "intersection of A & B", & "only in B"

Tags:

arrays

algorithm

sorting

dgbillotte

2 Answers

Ralf Kleberhoff

marvel308

Recent Activity

Donate For Us

algorithm: partition 2 arrays into "only in A", "intersection of A & B", & "only in B"

Tags:

arrays

algorithm

sorting

dgbillotte

2 Answers

Ralf Kleberhoff

marvel308

Related questions

Recent Activity

Donate For Us