Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

average of corresponding elements of a list of data frames

I have a list of 3 data frames (same shape and same column names). I want to take the average of all the corresponding elements in each data frame.

df3 = pd.DataFrame(np.random.randn(8, 3), columns= ['M', 'N', 'O'])
df4 = pd.DataFrame(np.random.randn(8, 3), columns= ['M', 'N', 'O'])
df5 = pd.DataFrame(np.random.randn(8, 3), columns= ['M', 'N', 'O'])

Currently, I am planning to do it this way.

pd.DataFrame.add(df5, df4, df3)/3

But I have a whole bunch of dataframes coming in - let's say a list of 100 data frames. Please suggest a better way to approach this problem. Is there a way to do without looping them?

like image 447
Praveen Gupta Sanka Avatar asked Sep 13 '25 18:09

Praveen Gupta Sanka


2 Answers

You can just use sum on the list and then divide by the len of the list, regardless of possible nans in the data frame:

list_of_dfs = [df3, df4, df5]
sum(list_of_dfs)/len(list_of_dfs)

#           M          N            O
#0  -0.264907   0.465489     0.238963
#1  -1.020216   0.235631    -0.227061
#2   0.331727   1.061115     0.188867

Or you can use reduce method to call pd.DataFrame.add to add all data frames in the list together:

reduce(pd.DataFrame.add, list_of_dfs)/len(list_of_dfs)
#           M          N            O
#0  -0.264907   0.465489     0.238963
#1  -1.020216   0.235631    -0.227061

If using python 3, import reduce firstly:

from functools import reduce
like image 111
Psidom Avatar answered Sep 16 '25 07:09

Psidom


You could concatenate your list of dataframes, reset index, then use groupby to get the mean.

df3 = pd.DataFrame(np.random.randn(8, 3), columns= ['M', 'N', 'O'])
df4 = pd.DataFrame(np.random.randn(8, 3), columns= ['M', 'N', 'O'])
df5 = pd.DataFrame(np.random.randn(8, 3), columns= ['M', 'N', 'O'])

# create list of dfs
dflist = [df3, df4, df5]

# reset_index creates new var you will use for groupby
joined = pd.concat(dflist).reset_index()
means = joined.groupby('index').mean()


print(means)
              M         N         O
index                              
0     -0.345697  0.354027 -0.125968
1      0.018404 -0.130260 -0.721745
2     -0.396263  0.475361 -1.173686
3      0.560518  0.774065  0.188009
4      1.521029  0.332139  0.078057
5     -0.180118 -0.948808 -0.889329
6      0.476496  0.236885  0.774599
7     -0.340693 -0.598964  0.381229
like image 20
Andrew Avatar answered Sep 16 '25 09:09

Andrew