Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Fill 'na' in pandas column with random elements from a list

Tags:

python

pandas

I am trying to fill 'NA' in a pandas column by randomly selecting elements from a list.

For example:

import pandas as pd
df = pandas.DataFrame()
df['A'] = [1, 2, None, 5, 53, None]
fill_list = [22, 56, 84]

Is it possible to write a function which takes the pandas DF with column name as input and replaces all NA by randomly selecting elements from the list 'fill_list'?

fun(df['column_name'], fill_list])
like image 417
bazinga Avatar asked Jan 20 '26 15:01

bazinga


1 Answers

Create new Series with numpy.random.choice and then replace NaNs by fillna or combine_first:

df['A'] = df['A'].fillna(pd.Series(np.random.choice(fill_list, size=len(df.index))))
#alternative
#df['A'] = df['A'].combine_first(pd.Series(np.random.choice(fill_list, size=len(df.index))))
print (df)
      A
0   1.0
1   2.0
2  84.0
3   5.0
4  53.0
5  56.0

Or:

#get mask of NaNs
m = df['A'].isnull()
#count rows with NaNs
l = m.sum()
#create array with size l
s = np.random.choice(fill_list, size=l)
#set NaNs values
df.loc[m, 'A'] = s
print (df)
      A
0   1.0
1   2.0
2  56.0
3   5.0
4  53.0
5  56.0
like image 141
jezrael Avatar answered Jan 22 '26 06:01

jezrael