I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values:
country value
AUT     NA
AUT     5
AUT     NA
AUT     NA
GER     NA
GER     NA
GER     7
GER     NA
GER     NA
The following generates the above data frame:
data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA))
Now, I would like to replace the NA values in each country subset using the method last observation carried forward (LOCF). I know the command na.locf in the zoo package. data <- na.locf(data) would give me the following data frame:
country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     5
GER     5
GER     7
GER     7
GER     7
However, the function should only be used on the individual subsets split by the country. The following is the output I would need:
country value
AUT     NA
AUT     5
AUT     5
AUT     5
GER     NA
GER     NA
GER     7
GER     7
GER     7
I can't think of an easy way to implement it. Before starting with for-loops, I was wondering if anyone has any idea as to how to solve this.
Many thanks!!
A modern version of the ddply solution is to use the package dplyr:
library(dplyr)
DF %>%
  group_by(county) %>% 
  mutate(value = na.locf(value, na.rm = F))      
Here's a ddply solution. Try this
library(plyr)
ddply(DF, .(country), na.locf)
  country value
1     AUT  <NA>
2     AUT     5
3     AUT     5
4     AUT     5
5     GER  <NA>
6     GER  <NA>
7     GER     7
8     GER     7
9     GER     7
Edit
From ddply help you can find that 
.variables:  variables to split data frame by, 
as quoted variables, a formula or character vector.
so another alternatives to get what you want are:
ddply(DF, "country", na.locf)
ddply(DF, ~country, na.locf)
note that replacing .variables with DF$variable is not allowed, that's why you got an error when doing this.
DF is your data.frame
The tidyverse way, albeit not using locf, is:
library(tidyverse)
data %>% 
    group_by(country) %>% 
    fill(value)
Source: local data frame [9 x 2]
Groups: country [2]
country value
(fctr) (dbl)
1     AUT    NA
2     AUT     5
3     AUT     5
4     AUT     5
5     GER    NA
6     GER    NA
7     GER     7
8     GER     7
9     GER     7
Split the data.frame with by and use na.locf on the subsets:
do.call(rbind,by(data,data$country,na.locf))
If you would like to remove the row names:
do.call(rbind,unname(by(data,data$country,na.locf)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With