Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting a data frame depending if value in column of reference is greater or lower than 0

Tags:

r

I need to subset a dataset based on a column of reference values. For example, given a dataset:

col1 <- c(1,2,3,4)
col2 <- c(1,2,-1,4)
col3 <- c(1,2,-3,-4)
col_Reference <- c(-5,6,-7,8)
df <- cbind(col1,col2,col3,col_Reference)
df
     col1 col2 col3 col_Reference
[1,]    1    1    1            -5
[2,]    2    2    2             6
[3,]    3   -1   -3            -7
[4,]    4    4   -4             8

I would like to filter this rows depending on the value in the col_Reference. If the value is greater than 0 I want to keep the row only if every value is also greater than 0. Instead, If the value is lower than 0 I want to keep the row only if every value is also lower than 0. Allowing 0 mismatch I would like to have back:

     col1 col2 col3 col_Reference
[1,]    2    2    2             6

Then I would also like to control how many mismatch are allowed: Allowing at max 1 mismatch I should have back:

     col1 col2 col3 col_Reference
[1,]    2    2    2             6
[2,]    3   -1   -3            -7

allowing at max 2:

     col1 col2 col3 col_Reference
[1,]    2    2    2             6
[2,]    3   -1   -3            -7
[3,]    4    4   -4             8

I guess I should use apply() but I must admit I'm not so good at using it : (

Thanks a lot

like image 628
luca tucciarone Avatar asked Dec 13 '25 01:12

luca tucciarone


1 Answers

Not the most elegant solution, but this does the trick!

#Create the testing dataframe
col1 <- c(1,2,3,4)
col2 <- c(1,2,-1,4)
col3 <- c(1,2,-3,-4)
col_Reference <- c(-5,6,-7,8)
df <- cbind(col1,col2,col3,col_Reference)

#Create the function to do what we want
fun <- function(df, mismatch = 0){
  df <- as.data.frame(df)
  df <- apply(df, 1, function(r){
    if(sum(sign(r[1:(ncol(df)-1)]) != sign(r['col_Reference'])) <= mismatch){
      return(r)
    }else{
      return(NULL)
    }
  })
  df <- do.call('rbind', df)
  return(df)
}

Now, call the function!

fun(df)

        col1 col2 col3 col_Reference
[1,]    2    2    2             6

fun(df, mismatch = 1)

        col1 col2 col3 col_Reference
[1,]    2    2    2             6
[2,]    3   -1   -3            -7
[3,]    4    4   -4             8

fun(df, mismatch = 2)

        col1 col2 col3 col_Reference
[1,]    2    2    2             6
[2,]    3   -1   -3            -7
[3,]    4    4   -4             8

like image 75
BurlyPotatoMan Avatar answered Dec 14 '25 13:12

BurlyPotatoMan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!