Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a better way to use another cell to create a value in Numpy matrix

Tags:

python

numpy

I want to avoid using any loop when building a numpy matrix. The problem appears when I want to create the third column that should not get the same value than the celle in the same line but second column. In other word: [Random, Random, Random but not the previous one]

I will later need to make some tests on large files (1.500.000 lines in a GTFS file). If I keep using loops, the math will be slow

import numpy as np

T=8
M=np.zeros([T,4])

M[:,0]=np.random.randint(1,4,T)  
M[:,1]=np.random.randint(1,4,T)
for i in range(0,T):
    a=np.array([1,2,3])
    M[i,2]=np.random.choice((a[a!=M[i,1]]),size = 1)#porte retirée

print (M)

I would like to replace i by the M[:,1] numpy stuff.

like image 492
Tarik Bendahman Avatar asked Dec 20 '25 19:12

Tarik Bendahman


1 Answers

You could generate your random matrix all at once:

T = 8000
low, high = 1, 4
np.random.seed(1) # for reproducibility
m = np.random.randint(low, high,(T, 4))

Then you can recalculate m[:, 2] by adding clipped random numbers to m[:, 1] and wrapping those numbers so they stay within [low, high):

m[:,2]=(m[:,1]+np.clip(m[:,2],low,high-2)-low) % (high - low) + low

Wrapping is done like here

np.any(np.any(m[:,1]==m[:,2]))
# False

Edit # 1

The above method yields random values for m[:,2], but there is some correlation between m[:,1] and m[:,2]. Better not to reuse cliped values, just genereate fully random new ones:

m[:,2]=(m[:,1]+np.random.randint(low,high-1,T)-low) % (high - low) + low

np.bincount((m[:,2]-m[:,1])%3)
# array([     0, 499972, 500028], dtype=int64)

This should be marginally slower, but still just a few milliseconds for one million points in my PC

like image 69
Brenlla Avatar answered Dec 23 '25 07:12

Brenlla