I am using the following code to digitize an array into 16 bins:
numpy.digitize(array, bins=numpy.histogram(array, bins=16)[1])
I expect that the output is in the range [1, 16], since there are 16 bins. However, one of the values in the returned array is 17. How can this be explained?
This is actually documented behaviour of numpy.digitize():
Each index
ireturned is such thatbins[i-1] <= x < bins[i]ifbinsis monotonically increasing, orbins[i-1] > x >= bins[i]ifbinsis monotonically decreasing. If values inxare beyond the bounds ofbins,0orlen(bins)is returned as appropriate.
So in your case, 0 and 17 are also valid return values (note that the bin array returned by numpy.histogram() has length 17). The bins returned by numpy.histogram() cover the range array.min() to array.max(). The condition given in the docs shows that array.min() belongs to the first bin, while array.max() lies outside the last bin -- that's why 0 is not in the output, while 17 is.
numpy.histogram() produces an array of the bin edges, of which there are (number of bins)+1.
In numpy version 1.8.,you have an option to select whether you want numpy.digitize to consider the interval to be closed or open. Following is an example (copied from http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html)
x = np.array([1.2, 10.0, 12.4, 15.5, 20.])
bins = np.array([0,5,10,15,20])
np.digitize(x,bins,right=True)
array([1, 2, 3, 4, 4])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With