I want to apply broom::tidy() to models nested in a fixest_multi object and extract the names of each list level as data frame columns. Here's an example of what I mean.
library(fixest)
library(tidyverse)
library(broom)
multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
This command estimates two models for each dep. var. (Ozone and Solar.R) for a subset of each Month plus the full sample. Here's how the resulting object looks like:
> names(multiple_est)
[1] "Full sample" "5" "6" "7" "8" "9"
> names(multiple_est$`Full sample`)
[1] "Ozone" "Solar.R"
I now want to tidy each model object, but keep the information of the Month / Dep.var. combination as columns in the tidied data frame. My desired output would look something like this:
I can run map_dfr from the tidyr package, giving me this result:
> map_dfr(multiple_est, tidy, .id ="Month") %>% head(9)
# A tibble: 9 x 6
Month term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Full sample (Intercept) -71.0 23.6 -3.01 3.20e- 3
2 Full sample Wind -3.06 0.663 -4.61 1.08e- 5
3 Full sample Temp 1.84 0.250 7.36 3.15e-11
4 5 (Intercept) -76.4 82.0 -0.931 3.53e- 1
5 5 Wind 2.21 2.31 0.958 3.40e- 1
6 5 Temp 3.07 0.878 3.50 6.15e- 4
7 6 (Intercept) -70.6 46.8 -1.51 1.45e- 1
8 6 Wind -1.34 1.13 -1.18 2.50e- 1
9 6 Temp 1.64 0.609 2.70 1.29e- 2
But this tidies only the first model of each Month, the model with the Ozone outcome.
My desired output would look something like this:
Month outcome term estimate more columns from tidy
Full sample Ozone (Intercept) -71.0
Full sample Ozone Wind -3.06
Full sample Ozone Temp 1.84
Full sample Solar.R (Intercept) some value
Full sample Solar.R Wind some value
Full sample Solar.R Temp some value
... rows repeated for each month 5, 6, 7, 8, 9
How can I apply tidy to all models and add another column that indicates the outcome of the model (which is stored in the name of the model object)?
So, fixest_mult has a pretty strange setup as I delved deeper. As you noticed, mapping across it or using apply just accesses part of the data frames. In fact, it isn't just the data frames for "Ozone", but actually just the data frames for the first 6 data frames (those for c("Full sample", "5", "6").
If you convert to a list, it access the data attribute, which is a sequential list of all 12 data frames, but dropping the relevant names you're looking for. So, as a workaround, could use pmap() and the names (found in the attributes of the object) to tidy() and then use mutate() for your desired columns.
library(fixest)
library(tidyverse)
library(broom)
multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
nms <- attr(multiple_est, "meta")$all_names
pmap_dfr(
list(
data = as.list(multiple_est),
month = rep(nms$sample, each = length(nms$lhs)),
outcome = rep(nms$lhs, length(nms$sample))
),
~ tidy(..1) %>%
mutate(
Month = ..2,
outcome = ..3,
.before = 1
)
)
#> # A tibble: 36 × 7
#> Month outcome term estimate std.error statistic p.value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Full sample Ozone (Intercept) -71.0 23.6 -3.01 3.20e- 3
#> 2 Full sample Ozone Wind -3.06 0.663 -4.61 1.08e- 5
#> 3 Full sample Ozone Temp 1.84 0.250 7.36 3.15e-11
#> 4 Full sample Solar.R (Intercept) -76.4 82.0 -0.931 3.53e- 1
#> 5 Full sample Solar.R Wind 2.21 2.31 0.958 3.40e- 1
#> 6 Full sample Solar.R Temp 3.07 0.878 3.50 6.15e- 4
#> 7 5 Ozone (Intercept) -70.6 46.8 -1.51 1.45e- 1
#> 8 5 Ozone Wind -1.34 1.13 -1.18 2.50e- 1
#> 9 5 Ozone Temp 1.64 0.609 2.70 1.29e- 2
#> 10 5 Solar.R (Intercept) -284. 262. -1.08 2.89e- 1
#> # … with 26 more rows
fixest changed the way these objects are structured (I'm using version 0.12.0), so the accepted solution broke.
fixest::coeftable provides the results in a data.frame.
library(fixest)
library(tidyverse)
library(broom)
multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
coeftable(multiple_est) |> head(6)
#> id sample.var sample lhs coefficient Estimate Std. Error
#> 1 1 Month Full sample Ozone (Intercept) -71.033218 23.5779922
#> 2 1 Month Full sample Ozone Wind -3.055491 0.6632503
#> 3 1 Month Full sample Ozone Temp 1.840179 0.2499634
#> 4 2 Month Full sample Solar.R (Intercept) -76.362113 81.9993511
#> 5 2 Month Full sample Solar.R Wind 2.210922 2.3078515
#> 6 2 Month Full sample Solar.R Temp 3.074600 0.8778282
#> t value Pr(>|t|)
#> 1 -3.0126915 3.196240e-03
#> 2 -4.6068443 1.080046e-05
#> 3 7.3617932 3.149109e-11
#> 4 -0.9312527 3.532924e-01
#> 5 0.9580001 3.396796e-01
#> 6 3.5025081 6.153870e-04
Pretty nice! But it's not a tibble and the names are difficult to work with.
While tidy doesn't have built-in support for fixest_multi ...
tidy(multiple_est)
#> Error: No tidy method for objects of class fixest_multi
... we can use multiple dispatch so that tidy knows how to.
tidy.fixest_multi <- function(fixest_multi) {
coeftable(fixest_multi) |>
as_tibble() |>
rename(
outcome = lhs,
term = coefficient,
estimate = Estimate,
std.error = `Std. Error`,
statistic = `t value`,
p.value = `Pr(>|t|)`
)
}
tidy(multiple_est)
#> # A tibble: 36 × 9
#> id sample.var sample outcome term estimate std.error statistic p.value
#> <int> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 Month Full sa… Ozone (Int… -71.0 23.6 -3.01 3.20e- 3
#> 2 1 Month Full sa… Ozone Wind -3.06 0.663 -4.61 1.08e- 5
#> 3 1 Month Full sa… Ozone Temp 1.84 0.250 7.36 3.15e-11
#> 4 2 Month Full sa… Solar.R (Int… -76.4 82.0 -0.931 3.53e- 1
#> 5 2 Month Full sa… Solar.R Wind 2.21 2.31 0.958 3.40e- 1
#> 6 2 Month Full sa… Solar.R Temp 3.07 0.878 3.50 6.15e- 4
#> 7 3 Month 5 Ozone (Int… -70.6 46.8 -1.51 1.45e- 1
#> 8 3 Month 5 Ozone Wind -1.34 1.13 -1.18 2.50e- 1
#> 9 3 Month 5 Ozone Temp 1.64 0.609 2.70 1.29e- 2
#> 10 4 Month 5 Solar.R (Int… -284. 262. -1.08 2.89e- 1
#> # ℹ 26 more rows
Created on 2024-07-16 with reprex v2.1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With