I have a data frame containing information on incidents for countries around the world. The structure of the data frame is similar to the following example:
a <- data.frame(country = c("AAA" , "BBB" , "CCC") ,
incident = rep("disaster" , times = 3) ,
'start year' = c(1990 , 1995 , 2011) ,
'end year' = c(1993 , 1995 , 2012))
giving a
:
country incident start.year end.year
1 AAA disaster 1990 1993
2 BBB disaster 1995 1995
3 CCC disaster 2011 2012
I would like to transform this so that each row contains the incident for each individual year instead of only the interval. Ideally, it would look like something like this:
country incident year
1 AAA disaster 1990
2 AAA disaster 1991
3 AAA disaster 1992
4 AAA disaster 1993
5 BBB disaster 1995
6 CCC disaster 2011
7 CCC disaster 2012
Is there an optimal code that can transform this to include a start and end year?
We may use map2
to get the sequence between the two columns as a list
and then unnest
the list
column
library(dplyr)
library(purrr)
library(tidyr)
a %>%
transmute(country, incident, year = map2(start.year, end.year, `:`)) %>%
unnest(year)
-output
# A tibble: 7 × 3
country incident year
<chr> <chr> <int>
1 AAA disaster 1990
2 AAA disaster 1991
3 AAA disaster 1992
4 AAA disaster 1993
5 BBB disaster 1995
6 CCC disaster 2011
7 CCC disaster 2012
If the 'country' column is unique
, either use a group by/summarise or use rowwise
to expand as well
a %>%
group_by(country) %>%
summarise(incident, year = start.year:end.year, .groups = 'drop')
# A tibble: 7 × 3
country incident year
<chr> <chr> <int>
1 AAA disaster 1990
2 AAA disaster 1991
3 AAA disaster 1992
4 AAA disaster 1993
5 BBB disaster 1995
6 CCC disaster 2011
7 CCC disaster 2012
Or use uncount
to expand the data
a %>%
uncount(end.year - start.year + 1) %>%
group_by(country) %>%
mutate(year = start.year + row_number() - 1, .keep = 'unused',
end.year = NULL) %>%
ungroup
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With