Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Pandas combine two dataframes that provide different values

Tags:

python

pandas

I have two diffent dataframes with two columns and i want to merge them + get them sum of column B. The problem is dataframe 1 have some data, that i want to keep. I'll write an example so it make sense

Dataframe 1

Columns A Column B
House     walls,doors,rooms
Animal    Legs,nose,eyes
car       tires,engine 

Dataframe 2

Column A  Column B
House     windows,kitchen
Bike      wheels,bicycle chain

Desired result

Column A  Column B
House     walls,doors,rooms,windows,kitchen
Animal    Legs,nose,eyes
Car       tires,engine
Bike      wheels,bicycle chain

The merge function doesnt help and i tried to use pd.concat and then somehow aggregate data but didnt help either. Someone got an idea of how to solve it?

like image 512
max_pepsi Avatar asked May 09 '26 11:05

max_pepsi


1 Answers

pd.concat([df1, df2]).groupby("Column A")["Column B"].apply(', '.join).reset_index()

After concating your dataframes, group your values by Column A, then use apply to concat the grouped strings in column B, and finally restore Column A with reset_index().

Edit: expansion on comments

To remove duplicates, you can use the set data structure, which only keeps a single version of each element you put into it. For each row x, split the words, then convert the list of words into a set:

df4 = df3["Column B"].apply(lambda x: set(x.split(", "))).reset_index()

Note that after this, your column B will contain sets. I'll let you figure out how to reconvert from a set to a string using a similar pattern.

like image 160
sigma1510 Avatar answered May 11 '26 01:05

sigma1510



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!