I have a data frame with 1000 rows and I want to perform some operation on it with 100 rows at a time. So, I am trying to find out how would I use a counter increment on the number of rows and select 100 rows at a time like 1 to 100, then 101 to 200... uptil 1000 and perform operation on each subset using a for loop. Can anyone please suggest what how can this be done as I could not find out a good method.
An easy way would be to create a grouping variable, then use split() and lapply() to do whatever operations you need to.
Your grouping can be easily created using rep().
Here is an example:
set.seed(1)
demo = data.frame(A = sample(300, 50, replace=TRUE),
                  B = rnorm(50))
demo$groups = rep(1:5, each=10)
demo.split = split(demo, demo$groups)
lapply(demo.split, colMeans)
# $`1`
#           A           B      groups 
# 165.9000000  -0.1530186   1.0000000 
# 
# $`2`
#           A           B      groups 
# 168.2000000   0.1141589   2.0000000 
# 
# $`3`
#           A           B      groups 
# 126.0000000   0.1625241   3.0000000 
# 
# $`4`
#           A           B      groups 
# 159.4000000   0.3340555   4.0000000 
# 
# $`5`
#           A           B      groups 
# 181.8000000   0.0363812   5.0000000 
If you prefer to not add the groups to your source data.frame, you can achieve the same effect by doing the following:
groups = rep(1:5, each=10)
lapply(split(demo, groups), colMeans)
Of course, replace colMeans with whatever function you want.
Using your example of a data.frame with 1000 rows, your rep() statement should be something like:
rep(1:10, each=100)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With