Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the ( | ) syntax mean in an R formula?

Tags:

r

r-formula

lme4

I am following a tutorial and came across the following syntax:

# assume 'S' is the name of the subjects column
# assume 'X1' is the name of the first factor column
# assume 'X2' is the name of the second factor column
# assume 'X3' is the name of the third factor column
# assume 'Y' is the name of the response column
# run the ART procedure on 'df'

# linear mixed model syntax; see lme4::lmer
m = art(Y ~ X1 * X2 * X3 + (1|S), data=df) 

anova(m)

I am a bit confused by the (|) syntax. I looked at the documentation for the linear mixed model syntax lmer, and found: "Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors".

So I assume 1 and S here are two random effects terms. S makes sense as a random effect since it is a random variable that could stand for participant. But how is 1 a random variable? What does the 1 and | mean here?

like image 619
Null Salad Avatar asked Oct 20 '25 13:10

Null Salad


1 Answers

The | symbol is used in formulas in different ways in different functions. In the case of linear mixed models, its used to denote random effects. There are different types of random effects that can be used in mixed models:

  • Random intercepts, where the intercepts (but not the slopes) vary between subjects,
  • Random slopes, where the slopes (but not intercepts) vary between subjects
  • Random slopes and intercepts, where both vary between subjects. The slopes and intercepts can be modelled as being either correlated or uncorrelated.

The 1 in the formula is used to specify which one of these to use. Here are some examples, taken from my book:

library(lme4)
# Random intercept:
m1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)

# Random slope:
m2 <- lmer(Reaction ~ Days + (0 + Days|Subject), data = sleepstudy)

# Correlated random intercept and slope:
m3 <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)

# Uncorrelated random intercept and slope:
m4 <- lmer(Reaction ~ Days + (1|Subject) + (0 + Days|Subject),
           data = sleepstudy)

So in your example, (1|S) is used to add a random intercept, corresponding to different values of S.

A similar but notationally different use of | can be found in formulas for lmtree from partykit, which is used to fit decision trees with linear models in the node. In that case, the formula looks like y ~ x1 + x2 | z1 + z2 + z3, where y is the response variable, the x variables are the explanatory variables in the linear models and the z variables are the variables used for building the tree.

like image 54
MånsT Avatar answered Oct 23 '25 03:10

MånsT



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!