Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two data frames using the last row from each group

Our weather station recorded daily weather data (about 7 rows/observations) per week. We collected disease data once per week (one observation/row per week). How can I join the last row of the weather_df with disease_df while keeping other cells blank? I've tried using left_join, but it incorrectly adds one value from disease_df to all days of the week instead of recording the disease data at the end of the week.

Reproducible example

weather_df <- structure(list(week = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("1", "2"), class = "factor"), 
    date = structure(c(1401062400, 1401148800, 1401235200, 1401321600, 
    1401408000, 1401494400, 1401580800, 1401667200, 1402272000, 
    1402358400, 1402444800, 1402531200, 1402617600, 1402704000, 
    1402790400, 1402876800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    rainfall = c(0.8, 0, 1.4, 3, 0, 1, 0, 0, 3, 0, 2.4, 1.2, 
    0, 0, 0, 0), temperature = c(23.6, 21.9, 22.6, 20.1, 21.9, 
    20.3, 17.3, 15.5, 23.1, 22.4, 21.1, 20.4, 21.2, 21.5, 20.2, 
    20.4)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", 
"data.frame"))


disease_df <- structure(list(week = structure(1:2, levels = c("1", "2"), class = "factor"), 
    disease_intensity = c(0.4, 0.3)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))



combine_df <- left_join(weather_df, disease_df, by = "week")

Here is the output

enter image description here

As you can see 0.4 is added to all days of week 1 and 0.3 is added to all days of week 2. I just want these to be added to the last days of both weeks,while keeping the other cells blank.

like image 419
Ahsk Avatar asked Dec 11 '25 00:12

Ahsk


1 Answers

You can merge disease_df and weather_df with "last-match", and join the result back to weather_df.

library(dplyr)

left_join(disease_df, weather_df, by = "week", multiple = "last") %>%
  left_join(weather_df, .)

Another option is to create a flag column in weather_df indicating the last day of each week, and then merge to disease_df.

weather_df %>%
  mutate(flag = row_number() == which.max(date), .by = week) %>%
  left_join(mutate(disease_df, flag = TRUE), by = join_by(week, flag)) %>%
  select(-flag)

Output
# # A tibble: 16 × 5
#    week  date                rainfall temperature disease_intensity
#    <fct> <dttm>                 <dbl>       <dbl>             <dbl>
#  1 1     2014-05-26 00:00:00      0.8        23.6              NA  
#  2 1     2014-05-27 00:00:00      0          21.9              NA  
#  3 1     2014-05-28 00:00:00      1.4        22.6              NA  
#  4 1     2014-05-29 00:00:00      3          20.1              NA  
#  5 1     2014-05-30 00:00:00      0          21.9              NA  
#  6 1     2014-05-31 00:00:00      1          20.3              NA  
#  7 1     2014-06-01 00:00:00      0          17.3              NA  
#  8 1     2014-06-02 00:00:00      0          15.5               0.4
#  9 2     2014-06-09 00:00:00      3          23.1              NA  
# 10 2     2014-06-10 00:00:00      0          22.4              NA  
# 11 2     2014-06-11 00:00:00      2.4        21.1              NA  
# 12 2     2014-06-12 00:00:00      1.2        20.4              NA  
# 13 2     2014-06-13 00:00:00      0          21.2              NA  
# 14 2     2014-06-14 00:00:00      0          21.5              NA  
# 15 2     2014-06-15 00:00:00      0          20.2              NA  
# 16 2     2014-06-16 00:00:00      0          20.4               0.3
like image 160
Darren Tsai Avatar answered Dec 13 '25 16:12

Darren Tsai



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!