Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas fill column with random numbers with a total for each row

I've got a pandas dataframe like this:

     id  foo  
 0   A   col1 
 1   A   col2  
 2   B   col1  
 3   B   col3  
 4   D   col4  
 5   C   col2  

I'd like to create four additional columns based on unique values in foo column. col1,col2, col3, col4

     id  foo   col1 col2 col3 col4
 0   A   col1   75   20   5    0
 1   A   col2   20   80   0    0
 2   B   col1   82   10   8    0
 3   B   col3   5    4   80   11
 4   D   col4   0    5   10   85
 5   C   col2   12   78   5    5

The logic for creating the columns is as follows:

if foo = col1 then col1 contains a random number between 75-100 and the other columns (col2, col3, col4) contains random numbers, such that the total for each row is 100

I can manually create a new column and assign a random number, but I'm unsure how to include the logic of sum for each row of 100.

Appreciate any help!

like image 498
Kvothe Avatar asked Oct 21 '25 12:10

Kvothe


1 Answers

My two cents

d=[]
s=np.random.randint(75,100,size=6)

for x in 100-s:
    a=np.random.randint(100, size=3)
    b=np.random.multinomial(x, a /a.sum())
    d.append(b.tolist())
s=[np.random.choice(x,4,replace= False) for x in np.column_stack((s,np.array(d))) ]


df=pd.concat([df,pd.DataFrame(s,index=df.index)],1)
df

  id   foo   0   1   2   3
0  A  col1  16   1   7  76
1  A  col2   4   2  91   3
2  B  col1   4   4   1  91
3  B  col3  78   8   8   6
4  D  col4   8  87   3   2
5  C  col2   2   0  11  87
like image 76
BENY Avatar answered Oct 23 '25 03:10

BENY