Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of missing entries when merging DataFrames

In an exercise, I was asked to merge 3 DataFrames with inner join (df1+df2+df3 = mergedDf), then in another question I was asked to tell how many entries I've lost when performing this 3-way merging.

#DataFrame1
df1 = pd.DataFrame(columns=["Goals","Medals"],data=[[5,2],[1,0],[3,1]])
df1.index = ['Argentina','Angola','Bolivia']
print(df1)
            Goals    Medals
Argentina       5         2
Angola          1         0
Bolivia         3         1

#DataFrame2
df2 = pd.DataFrame(columns=["Dates","Medals"],data=[[1,0],[2,1],[2,2])
df2.index = ['Venezuela','Africa']
print(df2)
            Dates    Medals
Venezuela       1         0
Africa          2         1
Argentina       2         2

#DataFrame3
df3 = pd.DataFrame(columns=["Players","Goals"],data=[[11,5],[11,1],[10,0]])
df3.index = ['Argentina','Australia','Belgica']
print(df3)
           Players    Goals
Argentina       11        5
Australia       11        1
Spain           10        0

#mergedDf
mergedDf = pd.merge(df1,df2,how='inner',left_index=True, right_index=True)
mergedDf = pd.merge(mergedDf,df3,how='inner',left_index=True, right_index=True)
print(mergedDF)
           Goals_X  Medals_X  Dates  Medals_Y  Players  Goals_Y
Argentina        5         2      2         2       11        2

#Calculate number of lost entries by code

I tried to merge everything with outer join and then subtracting the mergedDf, but I don't know how to do this, can anyone help me? enter image description here

like image 214
Agustín Clemente Avatar asked Nov 18 '25 02:11

Agustín Clemente


2 Answers

I've found a simple but effective solution:

Merging the 3 DataFrames, inner and outer:

df1 = Df1()
df2 = Df2()
df3 = Df3()
inner = pd.merge(pd.merge(df1,df2,on='<Common column>',how='inner'),df3,on='<Common column>',how='inner')
outer = pd.merge(pd.merge(df1,df2,on='<Common column>',how='outer'),df3,on='<Common column>',how='outer')

Now, the number of missed entries (rows) is:

return (len(outer)-len(inner))
like image 176
Agustín Clemente Avatar answered Nov 20 '25 18:11

Agustín Clemente


Solution with outer join and parameter indicator, last count rows with no both in both indicator columns a and b by sum of True values (processes like 1s):

mergedDf = pd.merge(df1,df2,how='outer',left_index=True, right_index=True, indicator='a')
mergedDf = pd.merge(mergedDf,df3,how='outer',left_index=True, right_index=True, indicator='b')
print(mergedDf)
           Goals_x  Medals_x  Dates  Medals_y           a  Players  Goals_y  \
Africa         NaN       NaN    2.0       1.0  right_only      NaN      NaN   
Angola         1.0       0.0    NaN       NaN   left_only      NaN      NaN   
Argentina      5.0       2.0    2.0       2.0        both     11.0      5.0   
Australia      NaN       NaN    NaN       NaN         NaN     11.0      1.0   
Belgica        NaN       NaN    NaN       NaN         NaN     10.0      0.0   
Bolivia        3.0       1.0    NaN       NaN   left_only      NaN      NaN   
Venezuela      NaN       NaN    1.0       0.0  right_only      NaN      NaN   

                    b  
Africa      left_only  
Angola      left_only  
Argentina        both  
Australia  right_only  
Belgica    right_only  
Bolivia     left_only  
Venezuela   left_only

missing = ((mergedDf['a'] != 'both') & (mergedDf['b'] != 'both')).sum()
print (missing)
6

Another solution is use inner join and sum filtered values of each index which not matched mergedDf.index:

mergedDf = pd.merge(df1,df2,how='inner',left_index=True, right_index=True)
mergedDf = pd.merge(mergedDf,df3,how='inner',left_index=True, right_index=True)
vals = mergedDf.index
print (vals)
Index(['Argentina'], dtype='object')

dfs = [df1, df2, df3]
missing = sum((~x.index.isin(vals)).sum() for x in dfs)
print (missing)
6

Anoter solution if unique values in each index:

dfs = [df1, df2, df3]
L = [set(x.index) for x in dfs]

#https://stackoverflow.com/a/25324329/2901002
missing = len(set.union(*L) - set.intersection(*L))
print (missing)
6
like image 32
jezrael Avatar answered Nov 20 '25 16:11

jezrael