I am trying to optimize this nested for loop, which takes the min of 2 numbers, and then adds the result to the dataframe. I was able to cut it down significantly using vectorizing and initializing, but I'm not too sure how to apply that logic to a nested for loop. Is there a quick way to make this run faster? Sitting on over 5 hours of run time.
"Simulation" has 100k values, and "limits" has 5427 values
output <- data.frame(matrix(nrow = nrow(simulation),ncol = nrow(limits)))
res <- character(nrow(simulation))
for(i in 1:nrow(limits)){
for(j in 1:nrow(simulation)){
res[j] <- min(limits[i,1],simulation[j,1])
}
output[,i] <- res
}
edit*
dput(head(simulation))
structure(list(simulation = c(124786.7479,269057.2118,80432.47896,119513.0161,660840.5843,190983.7893)), .Names = "simulation", row.names = c(NA,6L), class = "data.frame")
dput(head(limits))
structure(list(limits = c(5000L,10000L,20000L,25000L,30000L)), .Names = "limits", row.names = c(NA, 6L), class = "data.frame")
If you have >15GB in RAM (~100K * 5500 * 8 bytes per number * 3 (result + outer x vals + outer y vals)) you can try:
outer(simulation[[1]], limits[[1]], pmin)
Although in reality you'll probably need more than 15GB because I think pmin will duplicate stuff even more. If you don't have the ram you'll have to break up the problem (e.g. rely on code that does a column at a time or some such).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With