Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shortening Length of Function Calls in R - revoScaleR rxGLM()

Tags:

r

glm

revoscaler

I'm using R to create some GLM models on a large data set at the moment. Because of its size I'm using the rxGlm() function in the revoScaleR package - it runs a lot faster than the basic glm() function.

I'm keeping all of the function calls in an R script so that I can reproduce my work later - audit trail, etc.

My function calls are very long because I have a lot of factors (~50). They all look something like this:

rxGlm_C <- rxGlm(Dependent.Variable ~
               1 +
               Factor 1 +
               Factor 2 +
               Factor 3 +
                     ...........
               Factor N,
             family = tweedie(var.power = 1.5, link.power = 0),
             data = myDataFrame,
             pweights = "Weight.Variable",
)

If, afterwards, I want to rerun the model fit but perhaps with just a slight change to the formula - typically removing a single factor at a time - is there any shorthand notation for this? At the moment I'm copying and pasting the function call into my script file and manually deleting single rows. Is there instead some kind of syntax that says:

"please fit the exact same GLM as last time, but remove Factor 13"?

It would make my script files an awful lot shorter. I've got about 3,000 lines of code in there at the moment and I'm not finished yet!

Thanks. Alan

like image 269
Alan Avatar asked Nov 24 '25 08:11

Alan


1 Answers

There are two cases. If you are using all the variables from myDataFrame, then you may simply write

rxGlm(Dependent.Variable ~ .,
      family = tweedie(var.power = 1.5, link.power = 0),
      data = myDataFrame, pweights = "Weight.Variable")

for the full model and then, say,

rxGlm(Dependent.Variable ~ . - Factor13,
      family = tweedie(var.power = 1.5, link.power = 0),
      data = myDataFrame, pweights = "Weight.Variable")

to drop Factor13.

If you are not using all the variables, then you could save your full formula, say,

frml <- y ~ Factor1 + Factor2 + Facto3

and then use update:

update(frml, ~ . - Factor3)
# y ~ Factor1 + Factor2

Note, though, that in this case . means "the same right hand side as in frml", rather than "all the variables" as in the former option.

Also, if it's the latter option, you may facilitate constructing the full formula with paste and formula.

like image 65
Julius Vainora Avatar answered Nov 27 '25 00:11

Julius Vainora



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!