NumPy

Question

I have two NumPy 1D arrays a and b.

How do I compare them lexicographically? Meaning that 1D arrays should be compared same way as Python compares tuples.

Main thing is that this should be done lazily, i.e. function should return result as soon as it is found on the left-most occurance of known result.

Also I'm looking for the fastest solution for numpy arrays. For some vectorized implementation, maybe using other numpy functions.

Otherwise non-lazy simple implementation could be like this:

i = np.flatnonzero((a < b) != (a > b))
print('a ' + ('==' if i.size == 0 else '<' if a[i[0]] < b[i[0]] else '>') + ' b')

Or lazy simple variant but slow due to using pure Python types:

ta, tb = tuple(a), tuple(b)
print('a ' + ('<' if ta < tb else '==' if ta == tb else '>') + ' b')

Another solution would be to use np.lexsort, but the question is if it is optimized for the case of just two columns (two 1D arrays) or not, also if it is lazy at all? Also the question is that lexsort's result is probably not enough to have three possibilities of answer </==/>, probably it is only enough to tell if <=. Also lexsort needs some non-lazy preprocessing like np.stack and reversing rows order.

print('a ' + ('<=' if np.lexsort(np.stack((a, b), 1)[::-1])[0] == 0 else '>') + ' b')

But can it be implemented in numpy lazily and fast? I need lazy behavior because 1D arrays can be quite large but in most cases comparison result is known very close to the beginning.

Nick · Accepted Answer

In straight python you'd iterate over the zipped lists:

def lazy_compare(a, b):
    for x, y in zip(a, b):
        if x < y:
            return 'a < b'
        if x > y:
            return 'a > b'
    return 'a == b'

e.g.

print(lazy_compare(['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'b', 'd', 'e']))
print(lazy_compare(['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'f']))
print(lazy_compare(['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']))

Output:

a > b
a < b
a == b

Since zip returns an iterator that only generates the values as you use them, this is lazy and will return a result as soon as it finds one, so will only require going over the entirety of both lists if they are equal.

NumPy - fastest lazy lexicographical comparing of 1D arrays

Tags:

python

arrays

comparison

lazy-evaluation

Arty

1 Answers

Nick

Recent Activity

Donate For Us