I have this list:
lst
lst <- list(a=c(2.5,9.8,5.0,6.7,6.5,5.2,34.4, 4.2,39.5, 1.3,0.0,0.0,4.1,0.0,0.0,25.5,196.5, 0.0,104.2,0.0,0.0,0.0,0.0,0.0),b=c(147.4,122.9,110.2,142.3))
I would like to calculate for each values of a list and for each element of a list (a and b) a z.score as: (x[i]-mean(x)/sd(x), where x are all values (togheter) of each element of a list and x[i] each single component of each list element.
I tried with lapply
lapply(lst,function (x) as.data.frame(apply(x,2, function(y)- lapply(lst,mean)/lapply(lst,sd))))
but there is an error...
maybe with for loop as:
lst.new <- vector("list",1)
for (i in 1:length(lst)){
for (j in 1:dim(data.frame(lst[i]))[1]){
res[j] <- (as.numeric(unlist(lst[i]))[j]-mean(as.numeric(unlist(lst[i])))/
sd(as.numeric(unlist(lst[i])))
lst.new[[i]] <- res
}
}
but the result is strange (sure I'm wrong in the lst.new output):
[[1]]
[1] -0.3635464 -0.1982809 -0.3069486 -0.2684621 -0.2729899 -0.3024208 0.3586413 -0.3250599 0.4741007 -0.3907133
[11] -0.4201442 -0.4201442 -0.3273238 -0.4201442 -0.4201442 0.1571532 4.0284412 -0.4201442 1.9388512 -0.4201442
[21] -0.4201442 -0.4201442 -0.4201442 -0.4201442
[[2]]
[1] 0.9671130 -0.4517055 -1.1871746 0.6717671 -0.2729899 -0.3024208 0.3586413 -0.3250599 0.4741007 -0.3907133
[11] -0.4201442 -0.4201442 -0.3273238 -0.4201442 -0.4201442 0.1571532 4.0284412 -0.4201442 1.9388512 -0.4201442
[21] -0.4201442 -0.4201442 -0.4201442 -0.4201442
the expected result can be a list or a data frame with different length as:
a b
-0.36 0.967113
-0.19 -0.45
[...] [...]
and so on...
P.S:
0.36 == (2.5- mean(unlist(lst[1])))/sd(unlist(lst[1]))
0.967113 == (147.4 -mean(unlist(lst[2])))/sd(unlist(lst[2]))
It's better for me to use lapply (or his family function) and to resolve the problem
Just for completeness' sake, if there wasn't the scale function @akrun pointed out, your code should have been:
lapply(lst,function(x) x-mean(x)/sd(x))
all those lapplys within applys mean you're trying to calculate the mean and sd of individual values...
Let's work through it step by step.
lapply takes lst and breaks it down into elements. Each element in turn is given as the argument to your anonymous function. That means the function gets a vector of numbers. Then, using R's vectorization, what we do is calculate for every element of the vector the result of that element, minus the mean of the whole vector divided by the sd of the whole vector.
Compare that with what happens in your code:
lapply(lst,function (x) as.data.frame(apply(x,2, function(y)- lapply(lst,mean)/lapply(lst,sd))))
So the first lapply breaks lst and sends the vectors one at a time to your function.
The function then has to break the vector down by columns (apply with dimension argument 2) - which is where it throws the error. But even if it succeeded to just break down the vector into elements, you then have two more lapplys that break down that single element and calculate the mean and sd for them individually.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With