Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using lists for simulation

I set myself a little challenge on my way to learning R. The question was, given a sample of 500 numbers in normal distribution with mean 20, how many numbers under 20 would I get for standard deviations from 6 to 10. Just to have to learn more I decided to get 4 samples for each sd. So by the end I should have:

sd6samp1:...

sd6samp2:...

....

sd10samp4:...

My first approach, which worked was:

 ddss<-c(6:10) # sd's
 sam<-c(1:4) # 4 samples for each
 k=0  # counter in 0
 for (i in ddss) {   # for each sd
   for (j in sam) {  # for each sample
     nam <- paste("sam",i,".",j, sep="") # building a name
     n <- assign(nam,rnorm(500, 20, i))  # the great assign function
     k <- k+sum(n<=0)
   }
   print(assign(paste("ds",i,sep=""), k)) # ohh assign you're great
   k=0 # reset counter
 }

While looking for how to create variable names with the looping 'i', founded that 'assign' does the work but it also said:

Note though that if you are planning some simulations, many guRus would say that you should use a list.

So I thoght it would be good to learn lists...

In the meanwhile I also discover a great other option... ddss <- c(6:10)

for (i in ddss) {
   print(paste('prob. x<=0), with sd=',i))
   print(pnorm(0,mean=20,sd=i)*500)
}

This worked to answer the question, but the lists were still to be done... and a lot of R has yet to be learned. The main idea wasn't to know the very prob or number of negatives... but to learn R and specifically some looping.

So, I've been trying to go with the mentioned lists

My closest approach has been:

ddss<-c(6:10) # sd's to be calculated.
sam<-c(1:4) # 4 samples for each sd
liss<-list()  # initializing the list
for (i in ddss) {   # for each sd
   liss[[i]] <- list()
   for (j in sam) {  # for each sample
      liss[[i]][[j]] <- rnorm(500, 20, i)
      print(paste('ds',i,'samp',j,'=',sum(liss[[i]][[j]]<0)))
   }
}

With this one I get the information but I'm wondering about two issues (1 & 2) and some other questions (3 & 4):

  1. I get a list of 10 elements, 6 empty ones and then 4 with sublists. I can't seem to find out how to work with elements 1:4 of the list (sd's) with the 6:9 names (the very sd's).

  2. Even though I tried, I couldn't get to name the lists elements through the 'for' loops. Any insight on these issues would be great.

  3. Since in this context of simulations. What do you think is better: nested lists (lists with sublists) or simple (longer) lists?

  4. I wondered whether the 'apply' functions would be of any help here, I tried to do something, like:

vbv<-matrix(c(6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9))
lsl<-apply(vbv, 2, function(x) rnorm(500,20,x))

But it looks I'm not getting even close....

like image 532
fioghual Avatar asked Dec 30 '25 02:12

fioghual


1 Answers

The problem is in your indexes: you are running over indexer i from ddss, which runs from 6 to 10. So in the first tour of duty in your outer loop, your first statement really says: liss[[6]]<-list(), implying that the first 5 ones are NULL.

So if you insist on working with loops, this is what you should do (check ?seq_along):

ddss<-c(6:10) # sd's to be calculated.
sam<-c(1:4) # 4 samples for each sd
liss<-list()  # initializing the list
for (i in seq_along(ddss)) {   # now, i runs from 1 to 5
   liss[[i]] <- list()
   for (j in sam) {  # for each sample
      liss[[i]][[j]] <- rnorm(500, 20, i)
      print(paste('ds',ddss[i],'samp',j,'=',sum(liss[[i]][[j]]<0)))
   }
   names(liss[[i]])<-as.character(sam)#this should solve your naming issue (1/2)
}
names(liss)<-as.character(ddss)#this should solve your naming issue (2/2)

Note that, as always, it is a good idea to name your variables something more useful than i or j: if you'd named it curds, maybe you wouldn't have used it immediately as an indexer in a list?

Now, if you are really aiming for improvement (but want to stick to lists), you indeed want to go with the apply style functions:

liss<-lapply(ddss, function(curds){ #apply the inline function to each ds and store results in a list
  return(lapply(sam, function(cursam){ #apply inline function to each sam and store results in a list
    rv<-rnorm(500, 20, curds)
    cat('ds',curds,'samp',cursam,'=',sum(rv<0), "\n") #maybe better for your purposes.
    return(rv)
  }))
}) 

Finally, for your case, there is not a lot of reason to actually use lists (nor do you even need to keep the sampled data for each ds/sam): you can store everything as a threedimensional array, but since you specify it as a learning exercise (hey, maybe the array thing can be your next exercise :-)), I'll leave it at that.

like image 79
Nick Sabbe Avatar answered Dec 31 '25 18:12

Nick Sabbe