Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Logistic stepwise regression with a fixed number of predictors

Tags:

r

r-caret

For a course I'm attending, I have to perform a logistic stepwise regression to reduce the number of predictors of a feature to a fixed number and estimate the accuracy of the resulting model.

I've been trying with regsubsets() from the leaps package, but I can't get its accuracy.
Now I'm trying with caret, because I can set its metric to "Accuracy", but I can't fix the number of predictors when I use method = "glmStepAIC" in the train() function, because it has no tune parameters.

step.model <- train(Outcome ~ .,
                    data = myDataset,
                    method = "glmStepAIC", 
                    metric = "Accuracy",
                    trControl = trainControl(method = "cv", number = 10),
                    trace = FALSE)

I've found this question, but the answer and the links don't seem to work for me. stepwise regression using caret in R

If not with caret, what would be the best way to achieve the reduced model with fixed number of predictors?

like image 458
h0rror_vacui Avatar asked Dec 19 '25 09:12

h0rror_vacui


1 Answers

You can specify the number of variables to keep in stepwise selection using the glmulti package. In this example columns a through g are related to the outcome, but columns A through E are not. In glmulti, confsetsize is the number of models to select and set minsize equal to maxsize for the number of variables to keep.

library(MASS)
library(dplyr)

set.seed(100)

dat=data.frame(a=rnorm(10000))
for (i in 2:12) {
  dat[,i]=rnorm(10000)
}
names(dat)=c("a", letters[2:7], LETTERS[1:5])

Yy=rep(0, 10000)
for (i in 1:7) {
  Yy=Yy+i*dat[,i]
}
Yy=1/(1+exp(-Yy))
outcome=c()
for (i in 1:10000) {
  outcome[i]=sample(c(1,0), 1, prob=c(Yy[i], 1-Yy[i]))
}
dat=mutate(dat, outcome=factor(outcome))

library(glmulti)

mod=glmulti(outcome ~ .,
        data=dat,
        level=1,
        method="g",
        crit="aic",
        confsetsize=5,
        plotty=F, report=T,
        fitfunction="glm",
        family="binomial",
        minsize=7,
        maxsize=7,
        conseq=3)

output

mod@objects[[1]]
Call:  fitfunc(formula = as.formula(x), family = "binomial", data = data)

Coefficients:
(Intercept)            a            b            c            d            e            f            g  
   -0.01386      1.11590      1.99116      3.00459      4.00436      4.86382      5.94198      6.89312  

Degrees of Freedom: 9999 Total (i.e. Null);  9992 Residual
Null Deviance:      13860 
Residual Deviance: 2183     AIC: 2199
like image 113
Vons Avatar answered Dec 21 '25 23:12

Vons



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!