Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapping broom::tidy to nested list of {fixest} models and keep name of list element

Tags:

r

tidy

broom

fixest

I want to apply broom::tidy() to models nested in a fixest_multi object and extract the names of each list level as data frame columns. Here's an example of what I mean.

library(fixest)
library(tidyverse)
library(broom)

multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)

This command estimates two models for each dep. var. (Ozone and Solar.R) for a subset of each Month plus the full sample. Here's how the resulting object looks like:

> names(multiple_est)
[1] "Full sample" "5"           "6"           "7"           "8"           "9" 
> names(multiple_est$`Full sample`)
[1] "Ozone"   "Solar.R"

I now want to tidy each model object, but keep the information of the Month / Dep.var. combination as columns in the tidied data frame. My desired output would look something like this:

I can run map_dfr from the tidyr package, giving me this result:


> map_dfr(multiple_est, tidy, .id ="Month") %>% head(9)
# A tibble: 9 x 6
  Month       term        estimate std.error statistic  p.value
  <chr>       <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 Full sample (Intercept)   -71.0     23.6      -3.01  3.20e- 3
2 Full sample Wind           -3.06     0.663    -4.61  1.08e- 5
3 Full sample Temp            1.84     0.250     7.36  3.15e-11
4 5           (Intercept)   -76.4     82.0      -0.931 3.53e- 1
5 5           Wind            2.21     2.31      0.958 3.40e- 1
6 5           Temp            3.07     0.878     3.50  6.15e- 4
7 6           (Intercept)   -70.6     46.8      -1.51  1.45e- 1
8 6           Wind           -1.34     1.13     -1.18  2.50e- 1
9 6           Temp            1.64     0.609     2.70  1.29e- 2

But this tidies only the first model of each Month, the model with the Ozone outcome.

My desired output would look something like this:

Month       outcome         term        estimate      more columns from tidy
              
Full sample Ozone           (Intercept)   -71.0   
Full sample Ozone           Wind          -3.06   
Full sample Ozone           Temp          1.84    
Full sample Solar.R         (Intercept)   some value  
Full sample Solar.R         Wind          some value  
Full sample Solar.R         Temp          some value  

... rows repeated for each month 5, 6, 7, 8, 9

How can I apply tidy to all models and add another column that indicates the outcome of the model (which is stored in the name of the model object)?

like image 633
Felix Avatar asked Oct 28 '25 07:10

Felix


2 Answers

So, fixest_mult has a pretty strange setup as I delved deeper. As you noticed, mapping across it or using apply just accesses part of the data frames. In fact, it isn't just the data frames for "Ozone", but actually just the data frames for the first 6 data frames (those for c("Full sample", "5", "6").

If you convert to a list, it access the data attribute, which is a sequential list of all 12 data frames, but dropping the relevant names you're looking for. So, as a workaround, could use pmap() and the names (found in the attributes of the object) to tidy() and then use mutate() for your desired columns.

library(fixest)
library(tidyverse)
library(broom)

multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)
nms <- attr(multiple_est, "meta")$all_names

pmap_dfr(
  list(
    data = as.list(multiple_est),
    month = rep(nms$sample, each = length(nms$lhs)),
    outcome = rep(nms$lhs, length(nms$sample))
  ),
  ~ tidy(..1) %>%
    mutate(
      Month = ..2,
      outcome = ..3,
      .before = 1
    )
)
#> # A tibble: 36 × 7
#>    Month       outcome term        estimate std.error statistic  p.value
#>    <chr>       <chr>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1 Full sample Ozone   (Intercept)   -71.0     23.6      -3.01  3.20e- 3
#>  2 Full sample Ozone   Wind           -3.06     0.663    -4.61  1.08e- 5
#>  3 Full sample Ozone   Temp            1.84     0.250     7.36  3.15e-11
#>  4 Full sample Solar.R (Intercept)   -76.4     82.0      -0.931 3.53e- 1
#>  5 Full sample Solar.R Wind            2.21     2.31      0.958 3.40e- 1
#>  6 Full sample Solar.R Temp            3.07     0.878     3.50  6.15e- 4
#>  7 5           Ozone   (Intercept)   -70.6     46.8      -1.51  1.45e- 1
#>  8 5           Ozone   Wind           -1.34     1.13     -1.18  2.50e- 1
#>  9 5           Ozone   Temp            1.64     0.609     2.70  1.29e- 2
#> 10 5           Solar.R (Intercept)  -284.     262.       -1.08  2.89e- 1
#> # … with 26 more rows
like image 129
caldwellst Avatar answered Oct 31 '25 01:10

caldwellst


fixest changed the way these objects are structured (I'm using version 0.12.0), so the accepted solution broke.

fixest::coeftable provides the results in a data.frame.

library(fixest)
library(tidyverse)
library(broom)

multiple_est <- feols(c(Ozone, Solar.R) ~ Wind + Temp, airquality, fsplit = ~Month)

coeftable(multiple_est) |> head(6)
#>   id sample.var      sample     lhs coefficient   Estimate Std. Error
#> 1  1      Month Full sample   Ozone (Intercept) -71.033218 23.5779922
#> 2  1      Month Full sample   Ozone        Wind  -3.055491  0.6632503
#> 3  1      Month Full sample   Ozone        Temp   1.840179  0.2499634
#> 4  2      Month Full sample Solar.R (Intercept) -76.362113 81.9993511
#> 5  2      Month Full sample Solar.R        Wind   2.210922  2.3078515
#> 6  2      Month Full sample Solar.R        Temp   3.074600  0.8778282
#>      t value     Pr(>|t|)
#> 1 -3.0126915 3.196240e-03
#> 2 -4.6068443 1.080046e-05
#> 3  7.3617932 3.149109e-11
#> 4 -0.9312527 3.532924e-01
#> 5  0.9580001 3.396796e-01
#> 6  3.5025081 6.153870e-04

Pretty nice! But it's not a tibble and the names are difficult to work with.

While tidy doesn't have built-in support for fixest_multi ...

tidy(multiple_est)
#> Error: No tidy method for objects of class fixest_multi

... we can use multiple dispatch so that tidy knows how to.

tidy.fixest_multi <- function(fixest_multi) {
  coeftable(fixest_multi) |>
  as_tibble() |>
  rename(
    outcome = lhs,
    term = coefficient,
    estimate = Estimate,
    std.error = `Std. Error`,
    statistic = `t value`,
    p.value = `Pr(>|t|)`
  )
}

tidy(multiple_est)
#> # A tibble: 36 × 9
#>       id sample.var sample   outcome term  estimate std.error statistic  p.value
#>    <int> <chr>      <chr>    <chr>   <chr>    <dbl>     <dbl>     <dbl>    <dbl>
#>  1     1 Month      Full sa… Ozone   (Int…   -71.0     23.6      -3.01  3.20e- 3
#>  2     1 Month      Full sa… Ozone   Wind     -3.06     0.663    -4.61  1.08e- 5
#>  3     1 Month      Full sa… Ozone   Temp      1.84     0.250     7.36  3.15e-11
#>  4     2 Month      Full sa… Solar.R (Int…   -76.4     82.0      -0.931 3.53e- 1
#>  5     2 Month      Full sa… Solar.R Wind      2.21     2.31      0.958 3.40e- 1
#>  6     2 Month      Full sa… Solar.R Temp      3.07     0.878     3.50  6.15e- 4
#>  7     3 Month      5        Ozone   (Int…   -70.6     46.8      -1.51  1.45e- 1
#>  8     3 Month      5        Ozone   Wind     -1.34     1.13     -1.18  2.50e- 1
#>  9     3 Month      5        Ozone   Temp      1.64     0.609     2.70  1.29e- 2
#> 10     4 Month      5        Solar.R (Int…  -284.     262.       -1.08  2.89e- 1
#> # ℹ 26 more rows

Created on 2024-07-16 with reprex v2.1.0

like image 25
Ari Anisfeld Avatar answered Oct 30 '25 23:10

Ari Anisfeld