I have a huge data frame. One column is an integer ranging from 1 to 2. What I need is a way to look for continous rows with a number of certain values in this column, subset these rows and process them later into graphs.
I attached a small example, which does at least some of the desired work: I am able to print out the subsets I am looking for. But two questions remain:
I already had a look at aggregate or ddply, but could not come up with a solution.
Any help is highly appreciated.
test<-c(rep(1,3),rep(2,5),rep(1,3),rep(2,3),rep(1,3),rep(2,8),rep(1,3))
letters<-c("a","b","c","d")
a1<-as.data.frame(cbind(test,letters))
BZ<-2 #The variable to look for
n_BZ=4 #The number of minimum appearences
k<-1 # A variable to be used as a list item index in which the subset will be stored
for (i in 2:nrow(a1)){
if (a1$test[i-1]!=BZ & a1$test[i]==BZ) # When "test" BECOMES "2"
{t_temp<-a1[i,]} #... start writing a temporary array
else if (a1$test[i-1]==BZ & a1$test[i]==BZ) # When "test" REMAINS "2"
{t_temp<-rbind(t_temp,a1[i,])} #... continue writing a temporary array
else if (a1$test[i-1]==BZ & a1$test[i]!=BZ) # When "test" ENDS BEING "2"
{if (nrow(t_temp)>n_BZ) #... check if the temporary array has more rows then demanded
{print(t_temp) #... print the array (desired: put the array to a list item k)
k<-k+1}} #... increase k
else # If array too small
{t_temp<-NULL} # reset
}
The rle function is really handy for stuff like this. It takes an atomic vector and returns a list with elements lengths and values, where lengths contains the run length of each value in values.
Since the call to cbind in your example coerces the test column to factor, I first converted it to numeric:
a1 <- within(a1, test <- as.numeric(as.character(test)))
Then the result can be obtained in a nice (essentially) one-liner:
with(rle(a1$test),
split(a1, rep(seq_along(lengths), lengths))[values == BZ & lengths >= n_BZ]
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With