I would like to calculate the percentage change of my variable "var2" for different cities over time relative to the year "2000" ?
I tried this:
library(dplyr)
data <- data.frame(cities= c('NY','NY','NY','NY','NY','PL','PL', 'PL','PL','PL','AS','AS','AS','AS','AS','RY','RY','RY','RY','RY', 'JK', 'JK', 'JK', 'JK', 'JK'), year=c('2000', '2002', '2004', '2006', '2008', '2000', '2002', '2004', '2006', '2008','2000', '2002', '2004', '2006', '2008','2000', '2002', '2004', '2006', '2008','2000', '2002', '2004', '2006', '2008'),
var2 = c(12,26,17,8,14, 12,20,10,8,14,12,20,10,8,14,12,20,10,8,14,12,20,10,3,5))
changes <- data2 %>%
group_by(cities) %>%
arrange(year, .by_group = TRUE) %>%
mutate(variable_change = round((var2/lag(var2) - 1)*100, digits = 1))
But it calculates the percentage change between each year and I'm trying to calculate the changes between 2000 and 2002, 2000 and 2004 and so on...
You can use match to get corresponding var2 where year = 2000 and divide it with var2 value in each city.
library(dplyr)
data %>%
group_by(cities) %>%
mutate(variable_change = var2/var2[match(2000, year)])
# cities year var2 variable_change
# <chr> <chr> <dbl> <dbl>
# 1 NY 2000 12 1
# 2 NY 2002 26 2.17
# 3 NY 2004 17 1.42
# 4 NY 2006 8 0.667
# 5 NY 2008 14 1.17
# 6 PL 2000 12 1
# 7 PL 2002 20 1.67
# 8 PL 2004 10 0.833
# 9 PL 2006 8 0.667
#10 PL 2008 14 1.17
# … with 15 more rows
We can use also use == if it is guaranteed to have only 1 year with value 2000 in each city.
data %>%
group_by(cities) %>%
mutate(variable_change = var2/var2[year == 2000])
We can use %in% and it would also work when there are NAs
library(dplyr)
data %>%
group_by(cities) %>%
mutate(variable_change = var2/var2[year %in% 2000])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With