I have a situation where I have two arrays and I need to partition them so that I end up with 3 arrays:
Example:
A = [1, 4, 3, 2]
B = [2, 6, 5, 3]
3part(A,B) => [[1,4], [6,5], [2,3]] # the order of the elements in each array doesn't matter
I've come up with a correct solution, but wonder if it could be quicker. It is (in pseudocode):
3part(A,B) =>
a_only = []
b_only = []
intersect = []
foreach a in A
if B.contains(a)
intersect.push(a)
else
a_only.push(a)
foreach b in B
if not intersect.contains(b)
b_only.push(b)
return [a_only, b_only, intersect]
In my case at hand, A & B will each contain up to 5000 complex structures (instead of ints) and it runs in about 1.5 secs. It gets used as part of a user interaction which can happen frequently, so ideally it would take < .5sec .
BTW, is there a name for this operation as a whole, other than "difference-difference-intersection"?
Thanks!
EDIT:
Based on the suggestions to use a hash, my updated code runs in under 40ms :-D
Here is the pseudocode:
(say that "key" is the element that I am using for comparison)
array_to_hash(A, Key)
h = {}
foreach a in A
h[a[Key]] = a
return h
3part(A,B) =>
a_only = []
b_only = []
intersect = {} // note this is now a hash instead of array
Ah = array_to_hash(A, 'key')
Bh = array_to_hash(B, 'key')
foreach ak in Ah.keys()
if Bh.hasKey(ak)
intersect[ak] = Ah[ak]
else
a_only.push(Ah[ak])
foreach bk in Bh.keys()
if not intersect.hasKey(bk)
b_only.push(Bh[bk])
return [a_only, b_only, intersect.values]
Thank you all for the suggestions.
Your pseudocode is good IF you use a hashed set data structure for all the "arrays".
I guess all current programming environments have decent collections support, including hash-based sets. Here are two examples how to do it in Java, running in more or less O(n+m). In Java, for hashed collections to function properly, it's important that your complex objects implement the hashCode() and the equals() method in a compliant fashion (can often be auto-generated by your IDE).
The first version completely relies on the set-algebra implementation of your library, which should result in O(n) if the library is OK:
private static void test1() {
Integer[] a = {1, 4, 3, 2};
Integer[] b = {2, 6, 5, 3};
Set<Integer> aOnly = new HashSet<>(Arrays.asList(a));
Set<Integer> bOnly = new HashSet<>(Arrays.asList(b));
Set<Integer> ab = new HashSet<>(aOnly);
ab.retainAll(bOnly);
aOnly.removeAll(ab);
bOnly.removeAll(ab);
System.out.println("A only: " + aOnly);
System.out.println("A and B: " + ab);
System.out.println("B only: " + bOnly);
}
The second one uses the fact that in Java, the remove() method returns true if the element was present before removing it. If your library doesn't do that you have to
private static void test2() {
Integer[] a = {1, 4, 3, 2};
Integer[] b = {2, 6, 5, 3};
Set<Integer> aOnly = new HashSet<>(Arrays.asList(a));
Set<Integer> bOnly = new HashSet<>();
Set<Integer> ab = new HashSet<>();
for (int bElem : b) {
if (aOnly.remove(bElem)) {
ab.add(bElem);
} else {
bOnly.add(bElem);
}
}
System.out.println("A only: " + aOnly);
System.out.println("A and B: " + ab);
System.out.println("B only: " + bOnly);
}
If your arrays are sortable then you could do 2 things
to check if a value is in the other array, simply do a binary search on the other array, Complexity O(nlogm + mlogn)
Or you could merge arrays into the 3 arrays using 2 pointers, since the arrays are sorted, if the first elements are equal add them to the intersection set, incase they are not if element in A < element in B. add the element in A to the array a[] and now check the 2nd element with the first element in B.
same if B was less than A
Complexity O(n + m).
you can maintain which element we are referring to using 2 pointers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With