Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Window function

I have a data frame DF, with three columns and n rows shown below:

Month Year  Default
1   2015    T
2   2015    T
3   2015    F
4   2015    T
5   2015    T
6   2015    T
7   2015    F

I would like to check if there are 3 T in a roll and keep going then print out all the starting year and month into a new DF.

I need to obtain the output as shown above. The output should like:

Month   Year
4   2015
like image 409
Lawrence Ng Avatar asked Mar 23 '26 06:03

Lawrence Ng


1 Answers

Here's an attempt using data.table devel version on GH and the new rleid function

library(data.table) # v 1.9.5+
setDT(df)[, indx := rleid(Default)]
df[(Default), if(.N > 2) .SD[1L], by = indx]
#    indx Month Year Default
# 1:    3     4 2015    TRUE

What we are basically doing here, is to set a unique index per consecutive events in Default, then by looking only when Default == TRUE we are checcking per each group if the group size is bigger than 2, if so, select the first instance in that group.


A shorter version (proposed by @Arun) would be

setDT(df)[, if(Default && .N > 2L) .SD[1L], by = .(indx = rleid(Default), Default)]
like image 61
David Arenburg Avatar answered Mar 24 '26 19:03

David Arenburg