Generate data frame from array for logistic regression

Question

Say you have an array like

dat <-  array(c(126, 100, 35, 61, 908, 688, 497, 807, 913, 747, 336, 598, 235, 172, 58, 121,402, 308, 121, 215, 182, 156, 72, 98, 60, 99, 11, 43, 104, 89, 21, 36), dim = c(2, 2, 8),dimnames = list(a = c(1, 0), b = c(1, 0), c = 1:8))


> > dat
, , c = 1

   b
a     1  0
  1 126 35
  0 100 61

, , c = 2

   b
a     1   0
  1 908 497
  0 688 807

, , c = 3

   b
a     1   0
  1 913 336
  0 747 598

, , c = 4

   b
a     1   0
  1 235  58
  0 172 121

, , c = 5

   b
a     1   0
  1 402 121
  0 308 215

, , c = 6

   b
a     1  0
  1 182 72
  0 156 98

, , c = 7

   b
a    1  0
  1 60 11
  0 99 43

, , c = 8

   b
a     1  0
  1 104 21
  0  89 36

and you want to fit logistic regression to predict a. Is there a simple way to generate a data frame from this array to use in glm? ie a data frame like

a b c
1 1 1 for 126 rows then
...
0 1 1 for 100 rows, etc.

Basically I need to get data to fit logistic regression when given the table with counts. It seems like there should be a simple way of doing it without manually generating the data.

thanks

BenBarnes · Accepted Answer

One way would be to start with the melt function in the reshape2 package:

library(reshape2)

datM <- melt(dat)
head(datM, 2)
#   a b c value
# 1 1 1 1   126
# 2 0 1 1   100

Then dcast that data to get the numbers of outcomes on one row:

dat2 <- dcast(datM, b + c ~ a)
head(dat2, 2)
#   b c   0   1
# 1 0 1  61  35
# 2 0 2 807 497

You can then use this data to perform a glm where the response is a 2-column matrix giving the numbers of successes and failures:

response <- as.matrix(dat2[, c(4, 3)])
bb <- dat2[, "b"]
cc <- dat2[, "c"]
glm1 <- glm(response ~ bb + cc, family = binomial(link = "logit"))

However, the model degrees of freedom (and log-likelihood, etc.) will not reflect the data structure you asked for in your question. To get the specific data structure you were aiming for, you could go back to the datM object.

EDIT:

The following loops over all columns of datM except for the value column, repeating the values datM$value times:

datRep <- lapply(datM[-grep("value", names(datM))], rep, times = datM$value)

Then cbind that back into a matrix and convert to data.frame to get the data structure you wanted:

dat3 <- as.data.frame(do.call(cbind, datRep))

glm2 <- glm(a ~ b + c, data = dat3, family = binomial(link = "logit"))

The coefficients of the two models are the same:

> coef(glm1)
(Intercept)          bb          cc 
-0.43854838  0.77039283 -0.03328575 
> coef(glm2)
(Intercept)           b           c 
-0.43854838  0.77039283 -0.03328575

But, as mentioned, the degrees of freedom, etc will not be:

> glm1$deviance
[1] 29.39535
> glm2$deviance
[1] 11381.87

Matthew Lundberg · Answer

Ugly as sin, but does what you need for this example.

dat1 <- data.frame(value = as.vector(dat),
    a=dimnames(dat)$a,
    b=rep(dimnames(dat)$b, each=length(dimnames(dat)$a)),
    c=rep(dimnames(dat)$c, each=length(dimnames(dat)$a)*length(dimnames(dat)$b)))

Better would be to use melt, as in @BenBarnes answer. This is more flexible and avoids the creation of factors.

dat1 <- melt(dat)

Then to get the expanded rows, you can use rep

dat2 <- data.frame(a=rep(dat1$a, dat1$value),
                   b=rep(dat1$b, dat1$value),
                   c=rep(dat1$c, dat1$value))

Generate data frame from array for logistic regression

Tags:

r

sayhey69

2 Answers

BenBarnes

Matthew Lundberg

Recent Activity

Donate For Us

Generate data frame from array for logistic regression

Tags:

r

sayhey69

2 Answers

BenBarnes

Matthew Lundberg

Related questions

Recent Activity

Donate For Us