undo (flatten) tabulation

Question

I have a large data.frame that actually is a table containing each factor combination with count per row. Here is a playground example:

> z <- data.frame(a=factor(c("x","x","x","y","y","y")),
                  b=factor(c("a","b","c","a","b","c")),
                  count=c(2,5,1,4,5,1))
> z
  a b count
1 x a     2
2 x b     5
3 x c     1
4 y a     4
5 y b     5
6 y c     1

In order to use the function DescTools::Lambda(), I must undo the tabulation and repeat each combination by the number of count. The rep function, however, produces an error:

> rep(z[,1:2], z$count)
Error in rep(z[, 1:2], z$count) : invalid 'times' argument

Can someone please suggest a correct way to achieve this?

jpsmith · Accepted Answer

You dont need to expand your table in the way you described as it may be computationally intensive for large data and DescTools::Lambda() accepts tables. An easier way would be to use xtabs to create a table and feed it into DescTools::Lambda()

Your example data were great but returned a 0 value for lambda, so here I am using the example data provided in ?DescTools::Lambda() to demonstrate it works:

# data copied from ?DescTools::Lambda()
m <- as.table(cbind(c(1768,946,115), c(807,1387,438), c(189,746,288), c(47,53,16)))

# your data structure
z <- setNames(as.data.frame(m), c("a", "b", "count"))


DescTools::Lambda(xtabs(count ~ a + b, data = z))
#[1] 0.2076188

If you did want to expand, the trick is to repeat the row numbers instead of the data and then use these for indexing the data.frame. You could do that by:

z[rep(seq_len(nrow(z)), z$count), c("a","b")]

I_O · Answer

two solutions which are more explicit about the columns involved:

library(dplyr)
z |> 
  reframe(across(c(a, b), ~ rep(.x, count)))

library(data.table)
z |> 
  as.data.table() |> 
  _[,
    lapply(.SD, \(xs) rep(xs, count)),
    .SDcols = c("a", "b")
  ]

note the speed differences, though:

Unit: microseconds
      expr      min        lq       mean    median        uq      max neval
      base   32.676   43.2020   62.08967   67.7315   74.9065  132.756   500
 datatable  382.196  438.1415  490.71282  483.7960  522.4790 1278.699   500
      tidy 1380.911 1455.9130 1617.94013 1508.0150 1586.0380 8495.531   500

(version "base" being @Rolands solution)

undo (flatten) tabulation

Tags:

r

cdalitz

2 Answers

jpsmith

I_O

Recent Activity

Donate For Us

undo (flatten) tabulation

Tags:

r

cdalitz

2 Answers

jpsmith

I_O

Related questions

Recent Activity

Donate For Us