I want to automatically determine which coefficients in a lm belong to a factor. So assume that I have the following models:
d <- data.frame(a = gl(4, 2, 16), b = gl(2, 1, 16),
x = runif(16), y = runif(16), Y = runif(16))
l1 <- lm(Y ~ a + b + x + y, data = d)
l2 <- lm(Y ~ x + y, data = d)
Then the names of the coefficients of the first model are as follows:
names(coef(l1))
# [1] "(Intercept)" "a2" "a3" "a4" "b2"
# [6] "x" "y"
Now ideally I would like a function which tells me that a2, a3, a4and b2 are coefficients of dummy-coded factors.
For a model which does not contain any factors (like l2), the output should be NULL.
I had a look at str(l1) and I saw that there is (in case of the presence of factors in the model) a slot xlevels. I could use names(l1$xlevels) to get a list of all factors in the model and then use grep on the names of the coefficients:
names(coef(l1))[unlist(sapply(names(l1$xlevels), function(.) grep(., names(coef(l1)))))]
# [1] "a2" "a3" "a4" "b2"
But this seems to me like a very dirty work-around and won't work as soon as I have similar names in my model:
d$a4 <- runif(16)
l3 <- update(l1, . ~ . + a4, data = d)
names(coef(l3))[unlist(sapply(names(l3$xlevels), function(.) grep(., names(coef(l3)))))]
# [1] "a2" "a3" "a4" "a4" "b2"
Also, a change of the default contrasts will change the name of the dummy coefficients in my models, so even the most elaborate strategy working on the names of the coefficients probably won't work.
Long story short: how do I get a list of all coefficients which belong to a factor?
Here are some approaches:
1) This assumes that any column of the model.matrix containing only zeros and ones belongs to a factor (except the intercept). It works for l1, l2 and l3, is quite short, does not depend on names (except for the intercept) and does not require fiddling with lm object components. It works for both main effects and interactions since if the main effects are 0/1 then the interactions will be too. l4 in the comments is an exammple where the 0/1 assumption does not hold.
m <- model.matrix(l1)
all01 <- apply(m == 0 | m == 1, 2, all)
setdiff(names(all01[all01]), "(Intercept)")
## [1] "a2" "a3" "a4" "b2"
2) This does not use names (except for intercept) and works for l1, l2 and l3 (and l4 in the comments). It does not assume anything about the model matrix but only works for main effects only models. (The no-intercept case is untested.)
cls <- attr(terms(l1), "dataClass")
intercept <- if ("(Intercept)" %in% names(coef(l1))) "" else "+ 0"
fn <- function(nm) names(coef(update(l1, paste(". ~", nm, intercept))))
setdiff(unlist(lapply(names(cls)[cls == "factor"], fn)), "(Intercept)")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With