Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R delete first and last x % of rows

Tags:

r

I have a data frame with 3 ID variables, then several values for each ID.

user   Log Pass  Value
2       2   123     342
2       2   123     543
2       2   123     231
2       2   124     257
2       2   124     342
4       3   125     543
4       3   125     231
4       3   125     257
4       3   125     342
4       3   125     543
4       3   125     231
4       3   125     257
4       3   125     543
4       3   125     231
4       3   125     257
4       3   125     543
4       3   125     231
4       3   125     257
4       3   125     543
4       3   125     231
4       3   125     257

The start and end of each set of values is sometimes noisy, and I want to be able to delete the first few values. Unfortunately the number of values varies significantly, but it is always the first and last 20% of values that are noisy.

I want to delete the first 20% of rows, with a minimum of 1 row deleted.

So for instance if there are 20 values for user 2 log 2 pass 123 I want to delete the first and last 4 rows. If there are only 3 values for the ID variable I want to delete the first and last row.

The resulting dataset would be:

user   Log Pass  Value
2       2   123     543
4       3   125     543
4       3   125     231
4       3   125     257
4       3   125     543
4       3   125     231
4       3   125     257
4       3   125     543
4       3   125     231

I've tried fiddling around with nrow but I struggle to figure out how to reference the % of rows by id variable.

Thanks.

Jonathan.

like image 747
Jonathan Nolan Avatar asked Dec 20 '25 03:12

Jonathan Nolan


1 Answers

I believe the following can do it.

DATA.

dat <-
structure(list(user = c(2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), Log = c(2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L), Pass = c(123L, 123L, 123L, 124L, 124L, 125L, 125L, 
125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 125L, 
125L, 125L, 125L), Value = c(342L, 543L, 231L, 257L, 342L, 543L, 
231L, 257L, 342L, 543L, 231L, 257L, 543L, 231L, 257L, 543L, 231L, 
257L, 543L, 231L, 257L)), .Names = c("user", "Log", "Pass", "Value"
), class = "data.frame", row.names = c(NA, -21L))

CODE.

fun <- function(x, p = 0.20){
    n <- nrow(x)
    m <- max(1, round(n*p))
    inx <- c(seq_len(m), n - seq_len(m) + 1)
    x[-inx, ]
}

result <- do.call(rbind, lapply(split(dat, dat$user), fun))
row.names(result) <- NULL
result
#   user Log Pass Value
#1     2   2  123   543
#2     2   2  123   231
#3     2   2  124   257
#4     4   3  125   342
#5     4   3  125   543
#6     4   3  125   231
#7     4   3  125   257
#8     4   3  125   543
#9     4   3  125   231
#10    4   3  125   257
#11    4   3  125   543
#12    4   3  125   231
#13    4   3  125   257
like image 114
Rui Barradas Avatar answered Dec 22 '25 19:12

Rui Barradas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!