I have a dataframe that contains as a column a model formula definition. I would like to mutate a new column where each row is a model based on the corresponding rows model definition.
Some data:
# Set up
library(tidyverse)
library(lubridate)
# Create data
mydf <- data.frame(
cohort = seq(ymd('2019-01-01'), ymd('2019-12-31'), by = '1 days'),
n = rnorm(365, 1000, 50) %>% round,
cohort_cost = rnorm(365, 800, 50)
) %>%
crossing(tenure_days = 0:365) %>%
mutate(activity_date = cohort + days(tenure_days)) %>%
mutate(daily_revenue = rnorm(nrow(.), 20, 1)) %>%
group_by(cohort) %>%
arrange(activity_date) %>%
mutate(cumulative_revenue = cumsum(daily_revenue)) %>%
arrange(cohort, activity_date) %>%
mutate(payback_velocity = round(cumulative_revenue / cohort_cost, 2)) %>%
select(cohort, n, cohort_cost, activity_date, tenure_days, everything())
## wider data
mydf_wide <- mydf %>%
select(cohort, n, cohort_cost, tenure_days, payback_velocity) %>%
group_by(cohort, n, cohort_cost) %>%
pivot_wider(names_from = tenure_days, values_from = payback_velocity, names_prefix = 'velocity_day_')
Now, the final problem code block. It fails on the very last line:
models <- data.frame(
from = mydf$tenure_days %>% unique,
to = mydf$tenure_days %>% unique
) %>%
expand.grid %>%
filter(to > from) %>%
filter(from > 0) %>%
arrange(from) %>%
mutate(mod_formula = paste0('velocity_day_', to, ' ~ velocity_day_', from)) %>%
mutate(model = lm(as.formula(mod_formula), data = mydf_wide))
Error: Problem with
mutate()inputmodel. x Inputmodelmust be a vector, not almobject. ℹ Inputmodelislm(as.formula(mod_formula), data = mydf_wide).
If I run the last code block minus the last line and take a look at the resulting data frame 'models' it looks like this:
models %>% head
from to mod_formula
1 1 2 velocity_day_2 ~ velocity_day_1
2 1 3 velocity_day_3 ~ velocity_day_1
3 1 4 velocity_day_4 ~ velocity_day_1
4 1 5 velocity_day_5 ~ velocity_day_1
5 1 6 velocity_day_6 ~ velocity_day_1
6 1 7 velocity_day_7 ~ velocity_day_1
I tried making it a list column, but to do that as far as I'm aware I need to group by. But in this case I need to group by everything. I amended the last code block:
models <- data.frame(
from = mydf$tenure_days %>% unique,
to = mydf$tenure_days %>% unique
) %>%
expand.grid %>%
filter(to > from) %>%
filter(from > 0) %>%
arrange(from) %>%
mutate(mod_formula = paste0('velocity_day_', to, ' ~ velocity_day_', from)) %>%
group_by_all() %>%
nest() %>%
mutate(model = lm(as.formula(mod_formula), data = mydf_wide))
However this results in the same error.
How can I add a new column onto 'models' that contains a linear model for each row based on the formula in field 'mod_formula'?
lm is not vectorized. Add rowwise to create a model for each row.
library(dplyr)
models <- data.frame(
from = mydf$tenure_days %>% unique,
to = mydf$tenure_days %>% unique
) %>%
expand.grid %>%
filter(to > from) %>%
filter(from > 0) %>%
arrange(from) %>%
mutate(mod_formula = paste0('velocity_day_', to, ' ~ velocity_day_', from)) %>%
rowwise() %>%
mutate(model = list(lm(as.formula(mod_formula), data = mydf_wide)))
models
# from to mod_formula model
# <int> <int> <chr> <list>
#1 1 2 velocity_day_2 ~ velocity_day_1 <lm>
#2 1 3 velocity_day_3 ~ velocity_day_1 <lm>
#3 1 4 velocity_day_4 ~ velocity_day_1 <lm>
#4 1 5 velocity_day_5 ~ velocity_day_1 <lm>
#5 1 6 velocity_day_6 ~ velocity_day_1 <lm>
#6 1 7 velocity_day_7 ~ velocity_day_1 <lm>
#...
#...
You can also use map instead of rowwise.
mutate(model = purrr::map(mod_formula, ~lm(.x, data = mydf_wide)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With