Merging DataFrames within a loop [duplicate]

Question

I have a folder with numerous csv files which look like this:

csv1

        2006    Percent       Land_Use
    0     13   5.379564      Developed
    1      8  25.781580  Grass/Pasture
    2      4  54.265050           Crop
    3     15   0.363983          Water
    4     16   6.244104       Wetlands
    5      6   4.691764         Forest
    6      1   3.031494        Alfalfa
    7     11   0.137424      Shrubland
    8      5   0.003671          Vetch
    9      3   0.055412         Barren
    10     7   0.009531          Grass
    11    12   0.036423           Tree

csv2

   2007    Percent       Land_Use
0     13   2.742430      Developed
1      4  56.007242           Crop
2      8  24.227963  Grass/Pasture
3     16   8.839979       Wetlands
4      6   6.181062         Forest
5      1   1.446668        Alfalfa
6     15   0.366116          Water
7      3   0.127760         Barren
8     11   0.034426      Shrubland
9      7   0.000827          Grass
10    12   0.025528           Tree

csv3

    2008    Percent       Land_Use
0    13   1.863809      Developed
1     8  31.455578  Grass/Pasture
2     4  57.896856           Crop
3    16   2.693929       Wetlands
4     6   4.417966         Forest
5     1   1.239176        Alfalfa
6     7   0.130849          Grass
7    15   0.266536          Water
8    11   0.004571      Shrubland
9     3   0.030731         Barren

and I want to merge them all together into one DataFrame on Land_Use

I am reading in the files like this:

pth = (r'G:\') 
for f in os.listdir(pth):
df=pd.read_csv(os.path.join(pth,f)

but I can't figure out how to merge all the individual dataframes after that. I figured out how to concat them but that isn't what I want. The type of merge I want is outer.

If I were to use a pathway to each csv file I would merge them like this, but I do NOT want to set a pathway to each file as there are many of them:

    one=pd.read_csv(r'G:\one.csv')
    two=pd.read_csv(r'G:	wo.csv')
    three=pd.read_csv(r'G:	hree.csv')
    merge=pd.merge(one,two, on=['Land_Use'], how='outer')
    mergetwo=pd.merge(merge,three,on=['Land_Use'], how='outer')

jezrael · Accepted Answer

I think you can use in python 3:

import functools

dfs = [df1,df2,df3]

df = functools.reduce(lambda left,right: pd.merge(left,right,on='Land_Use',how='outer'),dfs)
print (df)
    2006  Percent_x       Land_Use  2007  Percent_y  2008    Percent
0     13   5.379564      Developed  13.0   2.742430  13.0   1.863809
1      8  25.781580  Grass/Pasture   8.0  24.227963   8.0  31.455578
2      4  54.265050           Crop   4.0  56.007242   4.0  57.896856
3     15   0.363983          Water  15.0   0.366116  15.0   0.266536
4     16   6.244104       Wetlands  16.0   8.839979  16.0   2.693929
5      6   4.691764         Forest   6.0   6.181062   6.0   4.417966
6      1   3.031494        Alfalfa   1.0   1.446668   1.0   1.239176
7     11   0.137424      Shrubland  11.0   0.034426  11.0   0.004571
8      5   0.003671          Vetch   NaN        NaN   NaN        NaN
9      3   0.055412         Barren   3.0   0.127760   3.0   0.030731
10     7   0.009531          Grass   7.0   0.000827   7.0   0.130849
11    12   0.036423           Tree  12.0   0.025528   NaN        NaN

In python 2:

df = reduce(lambda left,right: pd.merge(left,right,on='Land_Use',how='outer'),dfs)

Working solution with glob:

import pandas as pd
import functools
import glob

pth = 'a/*.csv'
files = glob.glob(pth)
dfs = [pd.read_csv(f, sep=';') for f in files]

df = functools.reduce(lambda left,right: pd.merge(left,right,on='Land_Use', how='outer'),dfs)
print (df)
    2006  Percent_x       Land_Use  2008  Percent_y  2007    Percent
0     13   5.379564      Developed  13.0   1.863809  13.0   2.742430
1      8  25.781580  Grass/Pasture   8.0  31.455578   8.0  24.227963
2      4  54.265050           Crop   4.0  57.896856   4.0  56.007242
3     15   0.363983          Water  15.0   0.266536  15.0   0.366116
4     16   6.244104       Wetlands  16.0   2.693929  16.0   8.839979
5      6   4.691764         Forest   6.0   4.417966   6.0   6.181062
6      1   3.031494        Alfalfa   1.0   1.239176   1.0   1.446668
7     11   0.137424      Shrubland  11.0   0.004571  11.0   0.034426
8      5   0.003671          Vetch   NaN        NaN   NaN        NaN
9      3   0.055412         Barren   3.0   0.030731   3.0   0.127760
10     7   0.009531          Grass   7.0   0.130849   7.0   0.000827
11    12   0.036423           Tree   NaN        NaN  12.0   0.025528

Merging DataFrames within a loop [duplicate]

Tags:

pandas

python-2.7

Stefano Potter

1 Answers

jezrael

Recent Activity

Donate For Us

Merging DataFrames within a loop [duplicate]

Tags:

pandas

python-2.7

Stefano Potter

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us