Consider the following:
tmp1 = ['a', 'b', 'c', 'd', 'e']
tmp2 = ['f', 'g', 'h', 'b', 'd']
tmp3 = ['b', 'i', 'j', 'k', 'l']
matr = np.array([tmp1, tmp2, tmp3])
matr
Yields a matrix:
array([['a', 'b', 'c', 'd', 'e'],
['f', 'g', 'h', 'b', 'd'],
['b', 'i', 'j', 'k', 'l']],
dtype='|S1')
Now, I want to know the sum of values in each row that intersects a vector. Say,
vec = ['a', 'c', 'f', 'b']
[sum([y in vec for y in row]) for row in matr]
Returns,
[3, 2, 1]
This is the desired output. The problem with it is that my 'matr' is actually ≈ 1000000 x 2200, and I have 6700 vectors to compare against. The solution I have here is far too slow to attempt.
How can I improve what I'm doing?
It's worth noting that the values inside of the matr come from a set of ~30000 values, and I have the full set. I've considered solutions where I make a dict of these 30000 values against each vector, and use the dict to convert to True/False throughout the matrix before just summing by row. I'm not sure if this will help.
For matr and vec as arrays, here's one with np.searchsorted -
def count_in_rowwise(matr,vec):
sidx = vec.argsort()
idx = np.searchsorted(vec,matr,sorter=sidx)
idx[idx==len(vec)] = 0
return (vec[sidx[idx]] == matr).sum(1)
With a comparatively smaller vec, we can pre-sort it and use, to give us an alternative one to compute the row-counts, like so -
def count_in_rowwise_v2(matr,vec,assume_sorted=False):
if assume_sorted==1:
sorted_vec = vec
else:
sorted_vec = np.sort(vec)
idx = np.searchsorted(sorted_vec,matr)
idx[idx==len(sorted_vec)] = 0
return (sorted_vec[idx] == matr).sum(1)
The above solution(s) works on generic inputs(numbers or strings alike). To solve our specific case of strings, we could optimize it further by converting the strings to numbers by using np.unique and then re-using count_in_rowwise/count_in_rowwise_v2 and that will give us our second approach, like so -
u,ids = np.unique(matr, return_inverse=True)
out = count_in_rowwise(ids.reshape(matr.shape),ids[np.searchsorted(u,vec)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With