I have an array of integer labels and I would like to determine how many of each label is present and store those values in an array of the same size as the input. This can be accomplished with the following loop:
def counter(labels):
sizes = numpy.zeros(labels.shape)
for num in numpy.unique(labels):
mask = labels == num
sizes[mask] = numpy.count_nonzero(mask)
return sizes
with input:
array = numpy.array([
[0, 1, 2, 3],
[0, 1, 1, 3],
[3, 1, 3, 1]])
counter() returns:
array([[ 2., 5., 1., 4.],
[ 2., 5., 5., 4.],
[ 4., 5., 4., 5.]])
However, for large arrays, with many unique labels, 60,000 in my case, this takes a considerable amount time. This is the first step in a complex algorithm and I can't afford to spend more than about 30 seconds on this step. Is there a function that already exists that can accomplish this? If not, how can I speed up the existing loop?
Approach #1
Here's one using np.unique -
_, tags, count = np.unique(labels, return_counts=1, return_inverse=1)
sizes = count[tags]
Approach #2
With positive numbers in labels, simpler and more efficient way with np.bincount -
sizes = np.bincount(labels)[labels]
Runtime test
Setup with 60,000 unique positive numbers and two such sets of lengths 100,000 and 1000,000 are timed.
Set #1 :
In [192]: np.random.seed(0)
...: labels = np.random.randint(0,60000,(100000))
In [193]: %%timeit
...: sizes = np.zeros(labels.shape)
...: for num in np.unique(labels):
...: mask = labels == num
...: sizes[mask] = np.count_nonzero(mask)
1 loop, best of 3: 2.32 s per loop
In [194]: %timeit np.bincount(labels)[labels]
1000 loops, best of 3: 376 µs per loop
In [195]: 2320/0.376 # Speedup figure
Out[195]: 6170.212765957447
Set #2 :
In [196]: np.random.seed(0)
...: labels = np.random.randint(0,60000,(1000000))
In [197]: %%timeit
...: sizes = np.zeros(labels.shape)
...: for num in np.unique(labels):
...: mask = labels == num
...: sizes[mask] = np.count_nonzero(mask)
1 loop, best of 3: 43.6 s per loop
In [198]: %timeit np.bincount(labels)[labels]
100 loops, best of 3: 5.15 ms per loop
In [199]: 43600/5.15 # Speedup figure
Out[199]: 8466.019417475727
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With