Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas insert only new values from one df to another with conditions

my pandas dataframes df1 and df2

df2 is empty dataframe.

df1:

id name address startdate enddate middate
0 abc 123st west drive 01/31/2021 null 01/31/2021
1 bcc 23st east drive 01/31/2021 null 01/31/2021
2 abc 24st orlando 02/31/2021 null 02/31/2021
3 ac 2st coast rd 01/31/2021 null 01/31/2021
4 bcc 23st east drive 02/31/2021 null 02/31/2021

I want to insert df1 into df2 one by one based on name and address column,

  1. if name already exists
  2. need to check address
  3. if same address then crunch the record by updating middate of previous by start of new record.
  4. if different address update previous record then end and mid as start of new record

in our case id 0 and 2 are having same names but different address so the df2 should be like

df2:

id name address startdate enddate middate
0 abc 123st west drive 01/31/2021 02/31/2021 02/31/2021
1 abc 24st orlando 02/31/2021 null 02/31/2021
2 bcc 23st east drive 01/31/2021 null 02/31/2021
3 ac 2st coast rd 01/31/2021 null 01/31/2021

I used merge but not getting desired output. thanks in advance


1 Answers

Using groupby()

  • collapse name and address
  • groupby() name to update enddate and middate to startdate of next row in the group
df1 = pd.read_csv(io.StringIO("""id name    address startdate   enddate middate
0   abc 123st west drive    01/31/2021  null    01/31/2021
1   bcc 23st east drive 01/31/2021  null    01/31/2021
2   abc 24st orlando    02/31/2021  null    02/31/2021
3   ac  2st coast rd    01/31/2021  null    01/31/2021
4   bcc 23st east drive 02/31/2021  null    02/31/2021"""), sep="\t")

# collapse the records
# collapse the records
df2 = df1.groupby(["name","address"], as_index=False).agg({"startdate":"first","enddate":"last","middate":"last"})

# update the dates
df2 = (df2.groupby(["name"])
 .apply(lambda dfa: dfa.assign(enddate=dfa.startdate.shift(-1),
                               middate=np.where(dfa.startdate.shift(-1).isna(),
                                                dfa.middate,
                                                dfa.startdate.shift(-1))))
 .reset_index(drop=True)
) 
name address id startdate enddate middate
0 abc 123st west drive 0 01/31/2021 02/31/2021 02/31/2021
1 abc 24st orlando 2 02/31/2021 nan 02/31/2021
2 ac 2st coast rd 3 01/31/2021 nan 01/31/2021
3 bcc 23st east drive 4 02/31/2021 nan 02/31/2021
like image 139
Rob Raymond Avatar answered Dec 16 '25 11:12

Rob Raymond



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!