Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a column with values from another dataframe based on column conditions

I have two dataframes of differing index lengths that look like this:

df_1:

    State  Month  Total Time ... N columns
     AL     4       1000
     AL     5       500
      .
      .
      .
     VA     11      750
     VA     12      1500 

df_2:

    State   Month ... N columns
     AL      4        
     AL      5
      .
      .
      .
     VA      11
     VA      12

I would like to add a Total Time column to df_2 that uses the values from the Total Time column of df_1 if the State and Month value are the same between dataframes. Ultimately, I would end up with:

df_2:

    State  Month  Total Time ... N columns
     AL     4       1000
     AL     5       500
      .
      .
      .
     VA     11      750
     VA     12      1500 

I want to do this conditionally since the index lengths are not the same. I have tried this so far:

def f(row):
     if (row['State'] == row['State']) & (row['Month'] == row['Month']):
         val = x for x in df_1["Total Time"]
     return val

df_2['Total Time'] = df_2.apply(f, axis=1)

This did not work. What method would you use to accomplish this? Any help is appreciated!

like image 936
ccsucic Avatar asked Sep 16 '25 06:09

ccsucic


2 Answers

You can do this:

Consider my sample dataframes:

In [2327]: df_1                                                                                                                                                                                              
Out[2327]: 
  State  Month  Total Time
0    AL      2        1000
1    AB      4         500
2    BC      1         600

In [2328]: df_2                                                                                                                                                                                              
Out[2328]: 
  State  Month
0    AL      2
1    AB      5

In [2329]: df_2 = pd.merge(df_2, df_1, on=['State', 'Month'], how='left')                                                                                                                                      

In [2330]: df_2                                                                                                                                                                                              
Out[2330]: 
  State  Month  Total Time
0    AL      2      1000.0
1    AB      5         NaN
like image 174
Mayank Porwal Avatar answered Sep 17 '25 19:09

Mayank Porwal


As mentioned in other comment, pd.merge() is how you would join two dataframes and extract a column. The issue is that merging solely on 'State' and 'Month' would result in every permutation becoming a new column (all Al-4 in df_1 would join with all other AL-4 in df_2).

With your example, there'd be

df_1
  State  Month  Total Time  df_1 col...
0    AL      4        1000            6
1    AL      4         500            7
2    VA     12         750            8
3    VA     12        1500            9

df_2
  State  Month  df_2 col...
0    AL      4            1
1    AL      4            2
2    VA     12            3
3    VA     12            4


df_1_cols_to_use = ['State', 'Month', 'Total Time']
# note the selection of the column to use from df_1. We only want the column
# we're merging on, plus the column(s) we want to bring in, in this case 'Total Time'
new_df = pd.merge(df_2, df_1[df_1_cols_to_use], on=['State', 'Month'], how='left')

new_df:
  State  Month  df_2 col...  Total Time
0    AL      4            1        1000
1    AL      4            1         500
2    AL      4            2        1000
3    AL      4            2         500
4    VA     12            3         750
5    VA     12            3        1500
6    VA     12            4         750
7    VA     12            4        1500

You mention these have differing index lengths. Based on the parameters of the question, it's not possible to determine what value of Total Time would match up with a row in df_2. If there's three AL-4 entries in df_2, do they each get 1000, 500, or some combination? That info would be needed. Without this, this would be the best guess at getting all possibilities.

like image 36
Foundry Avatar answered Sep 17 '25 20:09

Foundry