I want to delete all rows containing a value larger than 7 in a cell in an arbitrary column, either across all columns or across specific columns.
a <- c(3,6,99,7,8,9)
b <- c(99,6,3,4,5,6)
c <- c(2,5,6,7,8,3)
df <- data.frame (a,b,c)
   a  b c
1  3 99 2
2  6  6 5
3 99  3 6
4  7  4 7
5  8  5 8
6  9  6 3
V1: I want to delete all rows containing values larger than 7, regardless of the column.
# result V1
   a  b c
2  6  6 5
4  7  4 7
V2: I want to delete all rows containing values larger than 7 in column b and c
# result V2
   a  b c
2  6  6 5
3 99  3 6
4  7  4 7
6  9  6 3
There are plenty of similar problems on SOF, but I couldn't find a solution to this problem. So far I can only find rows that include 7using res <- df[rowSums(df != 7) < ncol(df), ].
If we prefer to work with the Tidyverse package, we can use the filter() function to remove (or select) rows based on values in a column (conditionally, that is, and the same as using subset). Furthermore, we can also use the function slice() from dplyr to remove rows based on the index.
rowSums of the logical matrix df > 7 gives the number of 'TRUE' per each row.  We get '0' if there are no 'TRUE' for that particular row.  By negating the results, '0' will change to 'TRUE", and all other values not equal to 0 will be FALSE.  This can be used for subsetting.
df[!rowSums(df >7),]
#  a b c
#2 6 6 5
#4 7 4 7
For the 'V2', we use the same principle except that we are getting the logical matrix on a subset of 'df'. ie. selecting only the second and third columns.
df[!rowSums(df[-1] >7),]
#   a b c
#2  6 6 5
#3 99 3 6
#4  7 4 7
#6  9 6 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With