I have two pandas.DataFrames
which I would like to combine into one. The dataframes have the same number of columns, in the same order, but have column headings in different languages. How can I efficiently combine these dataframes?
df_ger
index Datum Zahl1 Zahl2
0 1-1-17 1 2
1 2-1-17 3 4
df_uk
index Date No1 No2
0 1-1-17 5 6
1 2-1-17 7 8
desired output
index Datum Zahl1 Zahl2
0 1-1-17 1 2
1 2-1-17 3 4
2 1-1-17 5 6
3 2-1-17 7 8
The only approach I came up with so far is to rename the column headings and then use pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
. However, I hope to find a more general approach.
If the columns are always in the same order, you can mechanically rename
the columns and the do an append
like:
new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))
df_ger = pd.read_fwf(StringIO(
u"""
index Datum Zahl1 Zahl2
0 1-1-17 1 2
1 2-1-17 3 4"""),
header=1).set_index('index')
df_uk = pd.read_fwf(StringIO(
u"""
index Date No1 No2
0 1-1-17 5 6
1 2-1-17 7 8"""),
header=1).set_index('index')
print(df_uk)
print(df_ger)
new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}
df_out = df_ger.append(df_uk.rename(columns=new_cols))
print(df_out)
Date No1 No2
index
0 1-1-17 5 6
1 2-1-17 7 8
Datum Zahl1 Zahl2
index
0 1-1-17 1 2
1 2-1-17 3 4
Datum Zahl1 Zahl2
index
0 1-1-17 1 2
1 2-1-17 3 4
0 1-1-17 5 6
1 2-1-17 7 8
Provided you can be sure that the structures of the two dataframes remain the same, I see two options:
Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:
df_ger.columns = df_uk.columns
df_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
This works whatever the column names are. However, technically it remains renaming.
Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:
np_ger_data = df_ger.as_matrix()
np_uk_data = df_uk.as_matrix()
np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)
df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])
This solution requires more resources, so I would opt for the first one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With