Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Assigning # values in a list to bins, by rounding up

Tags:

I want a function that can take a series and a set of bins, and basically round up to the nearest bin. For example:

my_series = [ 1, 1.5, 2, 2.3,  2.6,  3]
def my_function(my_series, bins):
    ...

my_function(my_series, bins=[1,2,3])
> [1,2,2,3,3,3]

This seems to be very close to what Numpy's Digitize is intended to do, but it produces the wrong values (asterisks for wrong values):

np.digitize(my_series, bins= [1,2,3], right=False)
> [1, 1*, 2, 2*, 2*, 3]

The reason why it's wrong is clear from the documentation:

Each index i returned is such that bins[i-1] <= x < bins[i] if bins is monotonically increasing, or bins[i-1] > x >= bins[i] if bins is monotonically decreasing. If values in x are beyond the bounds of bins, 0 or len(bins) is returned as appropriate. If right is True, then the right bin is closed so that the index i is such that bins[i-1] < x <= bins[i] or bins[i-1] >= x > bins[i]`` if bins is monotonically increasing or decreasing, respectively.

I can kind of get closer to what I want if I enter in the values decreasing and set "right" to True...

np.digitize(my_series, bins= [3,2,1], right=True)
> [3, 2, 2, 1, 1, 1]

but then I'll have to think of a way of basically methodically reversing the lowest number assignment (1) with the highest number assignment (3). It's simple when there are just 3 bins, but will get hairier when the number of bins get longer..there must be a more elegant way of doing all this.

like image 895
Afflatus Avatar asked Sep 08 '16 04:09

Afflatus


2 Answers

We can simply use np.digitize with its right option set as True to get the indices and then to extract the corresponding elements off bins, bring in np.take, like so -

np.take(bins,np.digitize(a,bins,right=True))
like image 180
Divakar Avatar answered Sep 25 '22 14:09

Divakar


I believe np.searchsorted will do what you want:

Find the indices into a sorted array a such that, if the corresponding elements in v were inserted before the indices, the order of a would be preserved.

In [1]: my_series = [1, 1.5, 2, 2.3, 2.6, 3]

In [2]: bins = [1,2,3]

In [3]: import numpy as np

In [4]: [bins[k] for k in np.searchsorted(bins, my_series)]
Out[4]: [1, 2, 2, 3, 3, 3]

(As of numpy 1.10.0, digitize is implemented in terms of searchsorted.)

like image 45
Christian Ternus Avatar answered Sep 26 '22 14:09

Christian Ternus