I want to avoid using any loop when building a numpy matrix. The problem appears when I want to create the third column that should not get the same value than the celle in the same line but second column. In other word: [Random, Random, Random but not the previous one]
I will later need to make some tests on large files (1.500.000 lines in a GTFS file). If I keep using loops, the math will be slow
import numpy as np
T=8
M=np.zeros([T,4])
M[:,0]=np.random.randint(1,4,T)
M[:,1]=np.random.randint(1,4,T)
for i in range(0,T):
a=np.array([1,2,3])
M[i,2]=np.random.choice((a[a!=M[i,1]]),size = 1)#porte retirée
print (M)
I would like to replace i by the M[:,1] numpy stuff.
You could generate your random matrix all at once:
T = 8000
low, high = 1, 4
np.random.seed(1) # for reproducibility
m = np.random.randint(low, high,(T, 4))
Then you can recalculate m[:, 2] by adding clipped random numbers to m[:, 1] and wrapping those numbers so they stay within [low, high):
m[:,2]=(m[:,1]+np.clip(m[:,2],low,high-2)-low) % (high - low) + low
Wrapping is done like here
np.any(np.any(m[:,1]==m[:,2]))
# False
Edit # 1
The above method yields random values for m[:,2], but there is some correlation between m[:,1] and m[:,2]. Better not to reuse cliped values, just genereate fully random new ones:
m[:,2]=(m[:,1]+np.random.randint(low,high-1,T)-low) % (high - low) + low
np.bincount((m[:,2]-m[:,1])%3)
# array([ 0, 499972, 500028], dtype=int64)
This should be marginally slower, but still just a few milliseconds for one million points in my PC
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With