I've got a pandas dataframe like this:
     id  foo  
 0   A   col1 
 1   A   col2  
 2   B   col1  
 3   B   col3  
 4   D   col4  
 5   C   col2  
I'd like to create four additional columns based on unique values in foo column. col1,col2, col3, col4
     id  foo   col1 col2 col3 col4
 0   A   col1   75   20   5    0
 1   A   col2   20   80   0    0
 2   B   col1   82   10   8    0
 3   B   col3   5    4   80   11
 4   D   col4   0    5   10   85
 5   C   col2   12   78   5    5
The logic for creating the columns is as follows:
if foo = col1 then col1 contains a random number between 75-100 and the other columns (col2, col3, col4) contains random numbers, such that the total for each row is 100
I can manually create a new column and assign a random number, but I'm unsure how to include the logic of sum for each row of 100.
Appreciate any help!
My two cents
d=[]
s=np.random.randint(75,100,size=6)
for x in 100-s:
    a=np.random.randint(100, size=3)
    b=np.random.multinomial(x, a /a.sum())
    d.append(b.tolist())
s=[np.random.choice(x,4,replace= False) for x in np.column_stack((s,np.array(d))) ]
df=pd.concat([df,pd.DataFrame(s,index=df.index)],1)
df
  id   foo   0   1   2   3
0  A  col1  16   1   7  76
1  A  col2   4   2  91   3
2  B  col1   4   4   1  91
3  B  col3  78   8   8   6
4  D  col4   8  87   3   2
5  C  col2   2   0  11  87
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With