Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split dataframe into 5 parts and use describe function for every part

I have a data frame like this:

df <- data.frame(x = 1:100, y = runif(100))

And I splitted it into 5 parts:

z <- split(df, rep(1:5, length.out = nrow(df), each = ceiling(nrow(df)/5)))

Now I'm trying to find descriptive statistics for every part in z but I'm getting this error: (I'm actually interested in finding descriptive statistics of df$y column in these 5 parts.)

psych::describe(z,na.rm = TRUE)

Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) : 
  is.atomic(x) is not TRUE
Ek olarak: Warning message:
In mean.default(x, na.rm = na.rm) :
  argument is not numeric or logical: returning NA

I'm trying to find something like this: (probably it won't look like z[1]$y, but assume that that's what I'm trying to find please)

           vars     n   mean     sd median trimmed    mad   min    max  range skew kurtosis   se
z[1]$y       5 44813   0.02   0.17   0.00    0.01   0.10 -0.97   8.87   9.84 6.19   211.87 0.00
....
z[5]$y       6 45220   0.15   0.07   0.14    0.15   0.05  0.05   0.81   0.76 3.83    31.53 0.00

Also, how can I use describe function for only y values in z[1] or z[5]?

I'm not sure about how to handle the list here, so thanks and appreciating your response.

like image 736
cey Avatar asked Nov 27 '25 12:11

cey


2 Answers

We could use lapply

library(psych)

n <- 20
nr <- nrow(df)
z <- split(df, rep(1:ceiling(nr/n), each=n, length.out=nr))

lapply(z, psych::describe)

Output:

$`1`
  vars  n  mean   sd median trimmed  mad min   max range skew kurtosis   se
x    1 20 10.50 5.92   10.5   10.50 7.41   1 20.00 19.00 0.00    -1.38 1.32
y    2 20  0.37 0.30    0.3    0.34 0.32   0  0.96  0.96 0.47    -1.13 0.07

$`2`
  vars  n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
x    1 20 30.50 5.92  30.50   30.50 7.41 21.00 40.00 19.00 0.00    -1.38 1.32
y    2 20  0.43 0.29   0.39    0.42 0.34  0.01  0.96  0.95 0.41    -1.14 0.06

$`3`
  vars  n  mean   sd median trimmed  mad   min   max range  skew kurtosis   se
x    1 20 50.50 5.92  50.50   50.50 7.41 41.00 60.00 19.00  0.00    -1.38 1.32
y    2 20  0.55 0.34   0.51    0.56 0.49  0.03  0.98  0.95 -0.08    -1.62 0.08

$`4`
  vars  n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
x    1 20 70.50 5.92  70.50   70.50 7.41 61.00 80.00 19.00 0.00    -1.38 1.32
y    2 20  0.52 0.27   0.46    0.52 0.39  0.15  0.94  0.79 0.12    -1.59 0.06

$`5`
  vars  n  mean   sd median trimmed  mad   min    max range  skew kurtosis   se
x    1 20 90.50 5.92  90.50   90.50 7.41 81.00 100.00 19.00  0.00    -1.38 1.32
y    2 20  0.62 0.33   0.65    0.65 0.43  0.01   0.99  0.98 -0.33    -1.48 0.07
like image 187
TarJae Avatar answered Nov 29 '25 00:11

TarJae


I think you can use the following solution. I am not familiar with describe function you are using, but if it takes a vector as its first argument you can use imap function of package purrr to specify you only want to apply your function on 1st & 5th elements. .y argument in imap refers to positions/names as .x refers to values:

library(dplyr)
library(purrr)

imap(z, ~ if(.y %in% c(1, 5)) {
  describe(.x[["y"]])
} else {
  .x
})

Here is another more compact solution in base R, suggested by my dear friend @akrun:

z[c("1", "5")] <- lapply(z[c("1", "5")], describe)
like image 26
Anoushiravan R Avatar answered Nov 29 '25 00:11

Anoushiravan R