Xn can take values of -1 or 1 each with a probability of 0.5. And Sn= Sn-1 + Xn How can I compute the partial sum observed at time n given by Sn = X1 + X2 + : : : + Xn. I'm trying to simulate a random walk here. I did the following but I'm not exactly sure it's right:
rw <- function(n){
x=numeric(n)
xdir=c(TRUE, FALSE)
step=c(1,-1)
for (i in 2:n)
if (sample(xdir,1)) {
x[i]=x[i-1]+sample(step,1)
} else {
x[i]=x[i-1]
}
list(x=x)
}
Please Help!
You can also do this really concisely and efficiently with cumsum
set.seed(1)
n <- 1000
x <- cumsum(sample(c(-1, 1), n, TRUE))

This answer is just to explain why your code did not work. @jake-burkhead gave the way you should actually write the code.
In this code, you only make a step half of the time. This is because you are sampling from xdir to decide if you move or not. Instead, I would recommend you the following inside your loop:
for(i in 2:n){
x[i] <- x[i - 1] + sample(step, 1)
}
The sample(step, 1) call decides if the walk moves 1 or -1.
To compute the partial sums, you can use cumsum() after you generate x. The result will be a vector of the partial sums at a given point in the walk.
This post addresses timings of various base R methods for this calculation. This post is inspired by comments to this post and the comment of @josilber in the post to the fastest method posted by Jake Burkhead.
Below, a variety of methods are used to calculate the random walk. To accomplish this, each function pulls 1000 values of either 1 or -1 as defined in fnc below. The timing test uses microbenchmark with 1000 replications for each method.
fnc <- function(n) sample(c(1L, -1L), n, replace=TRUE)
library(microbenchmark)
microbenchmark(all=cumsum(fnc(1000L)),
reduce=Reduce("+", fnc(1000L), accumulate=TRUE),
laplyRpCln=cumsum(unlist(lapply(rep.int(1L, 1000L), fnc))),
laplyRpAn=cumsum(unlist(lapply(rep.int(1L, 1000L), function(x) fnc(1L)))),
laplySqAn=cumsum(unlist(lapply(seq_len(1000L), function(x) fnc(1L)))),
saplyRpCln=cumsum(sapply(rep.int(1L, 1000L), fnc)),
saplyRpAn=cumsum(sapply(rep.int(1L, 1000L), function(x) fnc(1L))),
saplySqAn=cumsum(sapply(seq_len(1000L), function(x) fnc(1L))),
vaplyRpCln=cumsum(vapply(rep.int(1L, 1000L), fnc, FUN.VALUE=0)),
vaplyRpAn=cumsum(vapply(rep.int(1L, 1000L), function(x) fnc(1L), FUN.VALUE=0)),
vaplySqAn=cumsum(vapply(seq_len(1000L), function(x) fnc(1L), FUN.VALUE=0)),
replicate=cumsum(replicate(1000L, fnc(1L))),
forPre={vals <- numeric(1000L); for(i in seq_along(vals)) vals[i] <- fnc(1L); cumsum(vals)},
forNoPre={vals <- numeric(0L); for(i in seq_len(1000L)) vals <- c(vals, fnc(1L)); cumsum(vals)},
times=1000)
Here,
cumsum and pulling the sample all at once.Reduce to perform the summation.lapply and unlist to return a vector and iterates through 1000 instances of 1, calling the function directly by name.seq rather than rep.sapply is called instead of lapply/unlist.vapply is used in place of lapply/unlist.replicate, where the default is simplify=TRUE.for loop that pre-allocates the vector and the fills it in.for loop that creates an empty numeric(0) vector and then uses c to concatenate to that vector.This returns
Unit: microseconds
expr min lq mean median uq max neval cld
all 25.634 31.0705 85.66495 33.6890 35.3400 49240.30 1000 a
reduce 542.073 646.7720 780.13592 696.4775 750.2025 51685.44 1000 b
laplyRpCln 4349.384 5026.4015 6433.60754 5409.2485 7209.3405 58494.44 1000 c e
laplyRpAn 4600.200 5281.6190 6513.58733 5682.0570 7488.0865 55239.04 1000 c e
laplySqAn 4616.986 5251.4685 6514.09770 5634.9065 7488.1560 54263.04 1000 c e
saplyRpCln 4362.324 5080.3970 6325.66531 5506.5330 7294.6225 59075.02 1000 cd
saplyRpAn 4701.140 5386.1350 6781.95655 5786.6905 7587.8525 55429.02 1000 e
saplySqAn 4651.682 5342.5390 6551.35939 5735.0610 7525.4725 55148.32 1000 c e
vaplyRpCln 4366.322 5046.0625 6270.66501 5482.8565 7208.0680 63756.83 1000 c
vaplyRpAn 4657.256 5347.2190 6724.35226 5818.5225 7580.3695 64513.37 1000 de
vaplySqAn 4623.897 5325.6230 6475.97938 5769.8130 7541.3895 14614.67 1000 c e
replicate 4722.540 5395.1420 6653.90306 5777.3045 7638.8085 59376.89 1000 c e
forPre 5911.107 6823.3040 8172.41411 7226.7820 9038.9550 56119.11 1000 f
forNoPre 8431.855 10584.6535 11401.64190 10910.0480 11267.5605 58566.27 1000 g
Notice that the first method is clearly the fastest. This is followed by pulling the full sample at once and then using Reduce to perform the summation. Among the *apply functions, the "clean" versions, using the name of the function directly seem to have a minor performance improvement, and the lapply version seems to be on par with vapply, but given the range of values, this conclusion is not entirely straightforward. sapply appears to be the slowest, though the method of function call dominates the type of *apply function.
The two for loops performed the worst, and the pre-allocation for loop outperforming for loop growing with c.
Here, I'm running a patched version of 3.4.1 (patched roughly August 23, 2017) on openSuse 42.1.
Please let me know if you see any mistakes and I'll fix them as soon as I can. Thanks to Ben Bolker for prodding me to investigate the final function more, where I found a couple of bugs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With