Our weather station recorded daily weather data (about 7 rows/observations) per week. We collected disease data once per week (one observation/row per week). How can I join the last row of the weather_df with disease_df while keeping other cells blank? I've tried using left_join, but it incorrectly adds one value from disease_df to all days of the week instead of recording the disease data at the end of the week.
Reproducible example
weather_df <- structure(list(week = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("1", "2"), class = "factor"),
date = structure(c(1401062400, 1401148800, 1401235200, 1401321600,
1401408000, 1401494400, 1401580800, 1401667200, 1402272000,
1402358400, 1402444800, 1402531200, 1402617600, 1402704000,
1402790400, 1402876800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
rainfall = c(0.8, 0, 1.4, 3, 0, 1, 0, 0, 3, 0, 2.4, 1.2,
0, 0, 0, 0), temperature = c(23.6, 21.9, 22.6, 20.1, 21.9,
20.3, 17.3, 15.5, 23.1, 22.4, 21.1, 20.4, 21.2, 21.5, 20.2,
20.4)), row.names = c(NA, -16L), class = c("tbl_df", "tbl",
"data.frame"))
disease_df <- structure(list(week = structure(1:2, levels = c("1", "2"), class = "factor"),
disease_intensity = c(0.4, 0.3)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
combine_df <- left_join(weather_df, disease_df, by = "week")
Here is the output

As you can see 0.4 is added to all days of week 1 and 0.3 is added to all days of week 2. I just want these to be added to the last days of both weeks,while keeping the other cells blank.
You can merge disease_df and weather_df with "last-match", and join the result back to weather_df.
library(dplyr)
left_join(disease_df, weather_df, by = "week", multiple = "last") %>%
left_join(weather_df, .)
Another option is to create a flag column in weather_df indicating the last day of each week, and then merge to disease_df.
weather_df %>%
mutate(flag = row_number() == which.max(date), .by = week) %>%
left_join(mutate(disease_df, flag = TRUE), by = join_by(week, flag)) %>%
select(-flag)
# # A tibble: 16 × 5
# week date rainfall temperature disease_intensity
# <fct> <dttm> <dbl> <dbl> <dbl>
# 1 1 2014-05-26 00:00:00 0.8 23.6 NA
# 2 1 2014-05-27 00:00:00 0 21.9 NA
# 3 1 2014-05-28 00:00:00 1.4 22.6 NA
# 4 1 2014-05-29 00:00:00 3 20.1 NA
# 5 1 2014-05-30 00:00:00 0 21.9 NA
# 6 1 2014-05-31 00:00:00 1 20.3 NA
# 7 1 2014-06-01 00:00:00 0 17.3 NA
# 8 1 2014-06-02 00:00:00 0 15.5 0.4
# 9 2 2014-06-09 00:00:00 3 23.1 NA
# 10 2 2014-06-10 00:00:00 0 22.4 NA
# 11 2 2014-06-11 00:00:00 2.4 21.1 NA
# 12 2 2014-06-12 00:00:00 1.2 20.4 NA
# 13 2 2014-06-13 00:00:00 0 21.2 NA
# 14 2 2014-06-14 00:00:00 0 21.5 NA
# 15 2 2014-06-15 00:00:00 0 20.2 NA
# 16 2 2014-06-16 00:00:00 0 20.4 0.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With