I have a data frame like this:
df <- data.frame(x = 1:100, y = runif(100))
And I splitted it into 5 parts:
z <- split(df, rep(1:5, length.out = nrow(df), each = ceiling(nrow(df)/5)))
Now I'm trying to find descriptive statistics for every part in z but I'm getting this error: (I'm actually interested in finding descriptive statistics of df$y column in these 5 parts.)
psych::describe(z,na.rm = TRUE)
Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
is.atomic(x) is not TRUE
Ek olarak: Warning message:
In mean.default(x, na.rm = na.rm) :
argument is not numeric or logical: returning NA
I'm trying to find something like this: (probably it won't look like z[1]$y, but assume that that's what I'm trying to find please)
vars n mean sd median trimmed mad min max range skew kurtosis se
z[1]$y 5 44813 0.02 0.17 0.00 0.01 0.10 -0.97 8.87 9.84 6.19 211.87 0.00
....
z[5]$y 6 45220 0.15 0.07 0.14 0.15 0.05 0.05 0.81 0.76 3.83 31.53 0.00
Also, how can I use describe function for only y values in z[1] or z[5]?
I'm not sure about how to handle the list here, so thanks and appreciating your response.
We could use lapply
library(psych)
n <- 20
nr <- nrow(df)
z <- split(df, rep(1:ceiling(nr/n), each=n, length.out=nr))
lapply(z, psych::describe)
Output:
$`1`
vars n mean sd median trimmed mad min max range skew kurtosis se
x 1 20 10.50 5.92 10.5 10.50 7.41 1 20.00 19.00 0.00 -1.38 1.32
y 2 20 0.37 0.30 0.3 0.34 0.32 0 0.96 0.96 0.47 -1.13 0.07
$`2`
vars n mean sd median trimmed mad min max range skew kurtosis se
x 1 20 30.50 5.92 30.50 30.50 7.41 21.00 40.00 19.00 0.00 -1.38 1.32
y 2 20 0.43 0.29 0.39 0.42 0.34 0.01 0.96 0.95 0.41 -1.14 0.06
$`3`
vars n mean sd median trimmed mad min max range skew kurtosis se
x 1 20 50.50 5.92 50.50 50.50 7.41 41.00 60.00 19.00 0.00 -1.38 1.32
y 2 20 0.55 0.34 0.51 0.56 0.49 0.03 0.98 0.95 -0.08 -1.62 0.08
$`4`
vars n mean sd median trimmed mad min max range skew kurtosis se
x 1 20 70.50 5.92 70.50 70.50 7.41 61.00 80.00 19.00 0.00 -1.38 1.32
y 2 20 0.52 0.27 0.46 0.52 0.39 0.15 0.94 0.79 0.12 -1.59 0.06
$`5`
vars n mean sd median trimmed mad min max range skew kurtosis se
x 1 20 90.50 5.92 90.50 90.50 7.41 81.00 100.00 19.00 0.00 -1.38 1.32
y 2 20 0.62 0.33 0.65 0.65 0.43 0.01 0.99 0.98 -0.33 -1.48 0.07
I think you can use the following solution. I am not familiar with describe function you are using, but if it takes a vector as its first argument you can use imap function of package purrr to specify you only want to apply your function on 1st & 5th elements. .y argument in imap refers to positions/names as .x refers to values:
library(dplyr)
library(purrr)
imap(z, ~ if(.y %in% c(1, 5)) {
describe(.x[["y"]])
} else {
.x
})
Here is another more compact solution in base R, suggested by my dear friend @akrun:
z[c("1", "5")] <- lapply(z[c("1", "5")], describe)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With