I am having trouble with replicating a project that was done in Stata within R. One of the key snags I'm hitting is that I need to generate a variable that counts the number of years since a certain observation. Here's a simple recreation of what the data might look like:
data <- cbind(1960:1970, c(NA, NA, 22, NA, NA, NA, 24, NA, NA, NA, 22), c(NA, NA, NA, NA, NA, NA, 4, NA, NA, NA, 4))
[,1] [,2] [,3]
[1,] 1960 NA NA
[2,] 1961 NA NA
[3,] 1962 22 NA
[4,] 1963 NA NA
[5,] 1964 NA NA
[6,] 1965 NA NA
[7,] 1966 24 4
[8,] 1967 NA NA
[9,] 1968 NA NA
[10,] 1969 NA NA
[11,] 1970 22 4
I currently have the first two columns of data and I'm trying to automate the creation of column three with a function.
You can see that the third column is defined by the number of years between when values of the second column are not NAs but only after the first occurrence of the intervention (i.e. the second time column two has a value, but not the first).
If it's any help, here is the code in Stata that does this trick, where since is the third column in my simplified example. Basically this code is saying create new variable since that is defined as the number of years since there is a value in variable redist (second column in my example) after the first year there is a value in variable redist.
gen since=.
foreach n of numlist 1(1)10 {
replace since = year - year[_n-`n'] if redist!=. & redist[_n-`n']!=. & since==.
}
Thanks for the help in advance!
You can add a column of NA values, then fill in the differences with a logical vector. This assumes we begin with only the first two columns.
data <- cbind(data, NA)
nona <- !is.na(data[,2])
data[,3][nona] <- c(NA, diff(data[,1][nona]))
data
# [,1] [,2] [,3]
# [1,] 1960 NA NA
# [2,] 1961 NA NA
# [3,] 1962 22 NA
# [4,] 1963 NA NA
# [5,] 1964 NA NA
# [6,] 1965 NA NA
# [7,] 1966 24 4
# [8,] 1967 NA NA
# [9,] 1968 NA NA
#[10,] 1969 NA NA
#[11,] 1970 22 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With