Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: predict new values for groups

I've calculated a different regression for each group in a data frame:

DF.L <- DF %>%
group_by(Channel) %>%
do(Fit = rlm(L ~ -1 + Y + I(Y^2), data = .))

I want to apply this set of regressions to another data frame. To do so, I'm testing how to apply it to the same data frame:

DF %>%
group_by(Channel) %>%
do({
    Lfit <- predict(subset(DF.L, Channel == unique(.$Channel))$Fit, .)
    data.frame(., Lfit)
})
glimpse(DF)

But I keep getting this error:

Error in UseMethod("predict") : 
  no applicable method for 'predict' applied to an object of class "list"
Calls: %>% ... do_.grouped_df -> eval -> eval -> predict -> predict

What I am doing wrong?

like image 470
Medical physicist Avatar asked Dec 06 '25 03:12

Medical physicist


1 Answers

Using the built-in ChickWeight data:

library(dplyr)
library(MASS)
library(broom)
library(tidyr)
library(ggplot2)


head(ChickWeight)
  weight Time Chick Diet
1     42    0     1    1
2     51    2     1    1
3     59    4     1    1
4     64    6     1    1
5     76    8     1    1
6     93   10     1    1

Fit some models

ChickWeight_models <- ChickWeight %>% 
  group_by(Diet) %>% 
  do(fit = MASS::rlm(weight ~ Time + I(Time^2), data = .))

ChickWeight_models
Source: local data frame [4 x 2]
Groups: <by row>

# A tibble: 4 x 2
    Diet       fit
* <fctr>    <list>
1      1 <S3: rlm>
2      2 <S3: rlm>
3      3 <S3: rlm>
4      4 <S3: rlm>

So I've created a very similar object to your DF.L. It's a frame with the four groups, each with an rlm object in a list-column called fit.


Make up some test data

Now I'll make up some data to test this model on. In this case, I'll just take the original data and add some noise to each of the variables.

ChickWeight_simulated <- ChickWeight %>% 
  mutate(Time = Time + runif(length(Time)),
         weight = weight + rnorm(length(weight)))

ChickWeight_simulated 
    weight       Time Chick Diet
1 42.72075  0.9786272     1    1
2 51.12669  2.8399631     1    1
3 58.64632  4.4576380     1    1
4 63.77617  6.1083591     1    1
5 75.40434  8.1051792     1    1
6 91.75830 10.7899030     1    1

Now we want to combine the dataframe of the models with the new data to test on. First we group_by and tidyr::nest the simulated data. This creates an object that is a dataframe with the four groups and a list-column called data, each element of which contains a rolled-up dataframe.

ChickWeight_simulated %>% group_by(Diet) %>% nest()
# A tibble: 4 x 2
    Diet               data
  <fctr>             <list>
1      1 <tibble [220 x 3]>
2      2 <tibble [120 x 3]>
3      3 <tibble [120 x 3]>
4      4 <tibble [118 x 3]>

Add the original models to the new data

Then we can join it to the models dataframe:

ChickWeight_simulated %>% group_by(Diet) %>% nest() %>% 
  full_join(ChickWeight_models)
# A tibble: 4 x 3
    Diet               data       fit
  <fctr>             <list>    <list>
1      1 <tibble [220 x 3]> <S3: rlm>
2      2 <tibble [120 x 3]> <S3: rlm>
3      3 <tibble [120 x 3]> <S3: rlm>
4      4 <tibble [118 x 3]> <S3: rlm>

Now we group by Diet again, and use broom::augment to make a prediction of each model on the new simulated data. Since each group is one row, there is one element each of fit and data; we have to extract that single element out of each list-column into a usable form by using [[1]].

ChickWeight_simulated_predicted <-
ChickWeight_simulated %>% group_by(Diet) %>% nest() %>% 
  full_join(ChickWeight_models) %>% 
  group_by(Diet) %>% 
  do(augment(.$fit[[1]], newdata = .$data[[1]])) 

head(ChickWeight_simulated_predicted)
# A tibble: 6 x 6
# Groups:   Diet [1]
    Diet   weight       Time Chick  .fitted  .se.fit
  <fctr>    <dbl>      <dbl> <ord>    <dbl>    <dbl>
1      1 42.72075  0.9786272     1 43.62963 2.368838
2      1 51.12669  2.8399631     1 51.80855 1.758385
3      1 58.64632  4.4576380     1 59.67606 1.534051
4      1 63.77617  6.1083591     1 68.43218 1.534152
5      1 75.40434  8.1051792     1 80.00678 1.647612
6      1 91.75830 10.7899030     1 97.26450 1.726331

Sanity check

To prove that this really only used the model from a particular level of Diet on the simulated data from that level of Diet, we can visualize the model fit.

ChickWeight_simulated_predicted %>% 
  ggplot(aes(Time, weight)) + 
  geom_point(shape = 1) + 
  geom_ribbon(aes(Time, 
                  ymin = .fitted-1.96*.se.fit, 
                  ymax = .fitted+1.96*.se.fit),
              alpha = 0.5, fill = "black") +
  geom_line(aes(Time, .fitted), size = 1, color = "red") +
  facet_wrap(~Diet)

enter image description here

like image 157
Brian Avatar answered Dec 07 '25 18:12

Brian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!