Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Call functions with varying parameters to modify a numpy array efficiently

I want to eliminate the unefficient for loop from this code

import numpy as np

x = np.zeros((5,5))

for i in range(5):
    x[i] = np.random.choice(i+1, 5)

While maintaining the output given

[[0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 2. 2. 1. 0.]
 [1. 2. 3. 1. 0.]
 [1. 0. 3. 3. 1.]]

I have tried this

i = np.arange(5)
x[i] = np.random.choice(i+1, 5)

But it outputs

[[0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]
 [0. 1. 1. 3. 3.]]

Is it possible to remove the loop? If not, which is the most efficient way to proceed for a big array and a lot of repetitions?

like image 480
Xavi Reyes Avatar asked Nov 21 '25 13:11

Xavi Reyes


1 Answers

Create a random int array with the highest number per row as the number of columns. Hence, we can use np.random.randint with its high arg set as the no. of cols. Then, perform modulus operation to set across each row a different limit defined by the row number. Thus, we would have a vectorized implementation like so -

def create_rand_limited_per_row(m,n):
    s = np.arange(1,m+1)
    return np.random.randint(low=0,high=n,size=(m,n))%s[:,None]

Sample run -

In [45]: create_rand_limited_per_row(m=5,n=5)
Out[45]: 
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [1, 2, 0, 2, 1],
       [0, 0, 1, 3, 0],
       [1, 2, 3, 3, 2]])

To leverage multi-core with numexpr module for large data -

import numexpr as ne

def create_rand_limited_per_row_numepxr(m,n):
    s = np.arange(1,m+1)[:,None]
    a = np.random.randint(0,n,(m,n))
    return ne.evaluate('a%s')

Benchmarking

# Original approach
def create_rand_limited_per_row_loopy(m,n):
    x = np.empty((m,n),dtype=int)
    for i in range(m):
        x[i] = np.random.choice(i+1, n)
    return x

Timings on 1k x 1k data -

In [71]: %timeit create_rand_limited_per_row_loopy(m=1000,n=1000)
10 loops, best of 3: 20.6 ms per loop

In [72]: %timeit create_rand_limited_per_row(m=1000,n=1000)
100 loops, best of 3: 14.3 ms per loop

In [73]: %timeit create_rand_limited_per_row_numepxr(m=1000,n=1000)
100 loops, best of 3: 6.98 ms per loop
like image 58
Divakar Avatar answered Nov 24 '25 04:11

Divakar