I need to subset a dataset based on a column of reference values. For example, given a dataset:
col1 <- c(1,2,3,4)
col2 <- c(1,2,-1,4)
col3 <- c(1,2,-3,-4)
col_Reference <- c(-5,6,-7,8)
df <- cbind(col1,col2,col3,col_Reference)
df
col1 col2 col3 col_Reference
[1,] 1 1 1 -5
[2,] 2 2 2 6
[3,] 3 -1 -3 -7
[4,] 4 4 -4 8
I would like to filter this rows depending on the value in the col_Reference. If the value is greater than 0 I want to keep the row only if every value is also greater than 0. Instead, If the value is lower than 0 I want to keep the row only if every value is also lower than 0. Allowing 0 mismatch I would like to have back:
col1 col2 col3 col_Reference
[1,] 2 2 2 6
Then I would also like to control how many mismatch are allowed: Allowing at max 1 mismatch I should have back:
col1 col2 col3 col_Reference
[1,] 2 2 2 6
[2,] 3 -1 -3 -7
allowing at max 2:
col1 col2 col3 col_Reference
[1,] 2 2 2 6
[2,] 3 -1 -3 -7
[3,] 4 4 -4 8
I guess I should use apply() but I must admit I'm not so good at using it : (
Thanks a lot
Not the most elegant solution, but this does the trick!
#Create the testing dataframe
col1 <- c(1,2,3,4)
col2 <- c(1,2,-1,4)
col3 <- c(1,2,-3,-4)
col_Reference <- c(-5,6,-7,8)
df <- cbind(col1,col2,col3,col_Reference)
#Create the function to do what we want
fun <- function(df, mismatch = 0){
df <- as.data.frame(df)
df <- apply(df, 1, function(r){
if(sum(sign(r[1:(ncol(df)-1)]) != sign(r['col_Reference'])) <= mismatch){
return(r)
}else{
return(NULL)
}
})
df <- do.call('rbind', df)
return(df)
}
Now, call the function!
fun(df)
col1 col2 col3 col_Reference
[1,] 2 2 2 6
fun(df, mismatch = 1)
col1 col2 col3 col_Reference
[1,] 2 2 2 6
[2,] 3 -1 -3 -7
[3,] 4 4 -4 8
fun(df, mismatch = 2)
col1 col2 col3 col_Reference
[1,] 2 2 2 6
[2,] 3 -1 -3 -7
[3,] 4 4 -4 8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With