Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behaviour of "formula" with "data.table" in R

I am trying to dynamically form a formula to use in dynlm. I encounter a behaviour of function that I do not understand, which can be seen from this code:

library(data.table)
dt_test <- data.table("a"=rnorm(10), "b"=1:5)

dt_test[, .(.(
   formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
 )), .(b)]

The code above is expected to produce (identical) formulas for each value of b. This formula is enclosed in .(.(...)) to return a list, just so that it can be properly stored in a column from the original data.table.

However, the formula returned does not match the string originally provided, but adds a comma between the + and tt, as you can see from the ouput:

       b                                                                           V1
   <int>                                                                       <list>
1:     1 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
2:     2 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
3:     3 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
4:     4 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
5:     5 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2

Essentially, it adds a comma where there is none. It does so even re-arranging the terms of the sum, but it stops doing it if I erase q_val, for example. The same goes for as.formula.

I would like to understand what is going on and avoid it.

like image 378
oibaFox Avatar asked Oct 19 '25 02:10

oibaFox


2 Answers

This is just a cosmetic printing issue due to the way R treats long formulas:

If you run:

formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2"))

You will see R will default to printing it to 2 lines, cutting it off at "tt + tt2" (no matter how wide the console is):

#z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + 
#    tt + tt2

This is somewhat meaningful to the way R cosmetically shows you the formula - if you run deparse, it will output a character vector of length 2:

deparse(formula(paste0("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")))

# [1] "z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + "
# [2] "    tt + tt2"  

However, assigning your original code as df_formulas, you will see that it stores the formula as normal:

df_formulas <- dt_test[, .(.(
  formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")
)), .(b)]

dt_formulas[[2]]

# [[1]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
#   tt + tt2
# <environment: 0x7fa96ff6ffd8>
#   
# [[2]]
# z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 +
#   tt + tt2
# <environment: 0x7fa96ff6ffd8>
# ....

As you mentioned, this is also why you don't see the comma if you remove some of the variables in the formula code - it has nothing to do with what specifically you are removing, you're simply reducing the length sufficiently to avoid the automatic line break.

like image 193
jpsmith Avatar answered Oct 21 '25 17:10

jpsmith


Maybe you want add a list column, sth like this:

> library(data.table)
> dt_test <- data.table("a"=rnorm(10), "b"=1:5)
> dt_test[, x := list(rep(list(as.formula("z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + tt + tt2")), .N))]
> dt_test$x[[1]]
z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + 
    tt + tt2
<environment: 0x56169e0ce3c8>

This looks weird during printing,

> dt_test |> head(2)
            a     b                                                                            x
        <num> <int>                                                                       <list>
1: -0.5439367     1 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2
2:  0.1078461     2 z_val ~ s_val + q_val + L(s_dval, 32:0) + L(q_dval, 2:0) + 1 + ,    tt + tt2

but actually is a formula:

> class(dt_test$x[[1]])
[1] "formula"

You might adapt that to your dynamic .(b) stuff. Not sure if you need rep then; data.table doesn't like recycling, so it's needed in this example.

like image 32
jay.sf Avatar answered Oct 21 '25 15:10

jay.sf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!