Assuming I have a numpy array like: [1,2,3,4,5,6] and another array: [0,0,1,2,2,1] I want to sum the items in the first array by group (the second array) and obtain n-groups results in group number order (in this case the result would be [3, 9, 9]). How do I do this in numpy?
The numpy function bincount
was made exactly for this purpose and I'm sure it will be much faster than the other methods for all sizes of inputs:
data = [1,2,3,4,5,6]
ids = [0,0,1,2,2,1]
np.bincount(ids, weights=data) #returns [3,9,9] as a float64 array
The i-th element of the output is the sum of all the data
elements corresponding to "id" i
.
Hope that helps.
This is a vectorized method of doing this sum based on the implementation of numpy.unique. According to my timings it is up to 500 times faster than the loop method and up to 100 times faster than the histogram method.
def sum_by_group(values, groups):
order = np.argsort(groups)
groups = groups[order]
values = values[order]
values.cumsum(out=values)
index = np.ones(len(groups), 'bool')
index[:-1] = groups[1:] != groups[:-1]
values = values[index]
groups = groups[index]
values[1:] = values[1:] - values[:-1]
return values, groups
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With