what is the fastest way to get the mode of a numpy array

Question

I have to find the mode of a NumPy array that I read from an hdf5 file. The NumPy array is 1d and contains floating point values.

my_array=f1[ds_name].value    
mod_value=scipy.stats.mode(my_array)

My array is 1d and contains around 1M values. It takes about 15 min for my script to return the mode value. Is there any way to make this faster?

Another question is why scipy.stats.median(my_array) does not work while mode works?

AttributeError: module 'scipy.stats' has no attribute 'median'

Warren Weckesser · Accepted Answer

The implementation of scipy.stats.mode has a Python loop for handling the axis argument with multidimensional arrays. The following simple implementation, for one-dimensional arrays only, is faster:

def mode1(x):
    values, counts = np.unique(x, return_counts=True)
    m = counts.argmax()
    return values[m], counts[m]

Here's an example. First, make an array of integers with length 1000000.

In [40]: x = np.random.randint(0, 1000, size=(2, 1000000)).sum(axis=0)

In [41]: x.shape
Out[41]: (1000000,)

Check that scipy.stats.mode and mode1 give the same result.

In [42]: from scipy.stats import mode

In [43]: mode(x)
Out[43]: ModeResult(mode=array([1009]), count=array([1066]))

In [44]: mode1(x)
Out[44]: (1009, 1066)

Now check the performance.

In [45]: %timeit mode(x)
2.91 s ± 18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [46]: %timeit mode1(x)
39.6 ms ± 83.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

2.91 seconds for mode(x) and only 39.6 milliseconds for mode1(x).

what is the fastest way to get the mode of a numpy array

Tags:

python

numpy

scipy

Heli

1 Answers

Warren Weckesser

Recent Activity

Donate For Us

what is the fastest way to get the mode of a numpy array

Tags:

python

numpy

scipy

Heli

1 Answers

Warren Weckesser

Related questions

Recent Activity

Donate For Us