Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying multiple model formulas to groups of data

Tags:

r

dplyr

purrr

I'd like to apply 3 linear models to my data, and extract the residuals for each. I am wondering if there is a way to apply the same steps for each model using a combination of dplyr and purrr:

I want to keep:

  1. The lm object for each model
  2. The augment output for each model
  3. The residuals for each model

Here's a working example that analyzes the mpg dataset:

library(dplyr)
library(tidyr)
library(purrr)
library(broom)
library(ggplot2)

Here are the three different formulas I want to use for my lm

f1 = hwy ~ cyl
f2 = hwy ~ displ
f3 = hwy ~ cyl + displ

lin_mod = function(formula) {
  function(data) {
    lm(formula, data = data)
  }
}

This is how I extract residuals for a single formula:

mpg %>% 
group_by(manufacturer) %>% 
nest() %>% 
mutate(model = map(data, lin_mod(f1)), 
       aug = map(model, augment), 
       res = map(aug, ".resid"))

However, this technique seems like a bad way to do it for all the formulas, since I rewrite a lot of code:

mpg %>% 
group_by(manufacturer) %>% 
nest() %>% 
mutate(model1 = map(data, lin_mod(f1)), 
       aug1 = map(model1, augment), 
       res1 = map(aug1, ".resid"),
       model2 = map(data, lin_mod(f2)), 
       aug2 = map(model2, augment), 
       res2 = map(aug2, ".resid"),
       model3 = map(data, lin_mod(f3)), 
       aug3 = map(model3, augment), 
       res3 = map(aug3, ".resid"))

How do I apply this function to each formula in an elegant way? I was thinking that mutate_all, or putting the formulas into a list might help in some way, but alas I'm stuck.

like image 663
kmace Avatar asked Dec 04 '25 14:12

kmace


1 Answers

You could mutate list columns in place, using mutate_at (or mutate_if). This saves several iterations and makes the code pipeable and more compact.

library(dplyr)
library(tidyr)
library(purrr)
library(broom)

lin_mod = function(formula) {
  function(data,...){
  map(data,~lm(formula, data = .x))
  }
}

list_model <- list(cyl_model= hwy ~ cyl,
                   displ_model= hwy ~ displ,
                   full_model= hwy ~ cyl + displ) %>% 
              lapply(lin_mod)

ggplot2::mpg %>% 
  group_by(manufacturer) %>% nest() %>% 
    mutate_at(.vars=("data"),.funs=list_model) %>% 
    mutate_at(.vars=vars(ends_with("model")), .funs=~map(.x, augment)) %>% 
    mutate_at(.vars=vars(ends_with("model")), .funs=~map(.x, ".resid")) %>% unnest()
like image 67
dmi3kno Avatar answered Dec 07 '25 05:12

dmi3kno



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!