Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retain rows in an R dataframe which contain a specific pattern in a column

Tags:

dataframe

r

df is a test dataframe in which I want to retain only the rows which contain the | character in the Hits column and the rows which contain the hits (i.e. Hit1, Hit2 etc.) and also the blank cells in the Hits column.

df <- data.frame(
 Hits = c("# test114", "# ID|987129470", "# 13135", "Hit1", "", "Hit2", "Hit3", "", "# test739", "# ID|6971324987", "# 83771", "Hit1", "Hit2", "", "Hit3"),
 Category1 = c(NA, NA, NA, 0.001, 0.001, 0.002, 0.003, 0.003, NA, NA, NA, 0.023, 0.341, 0.341, 0.569),
 Category2 = c(NA, NA, NA, 100, 100, 99, 98, 98, NA, NA, NA, 100, 95, 95, 97),
 Category3 = c(NA, NA, NA, 100, 100, 99, 98, 98, NA, NA, NA, 98, 97, 97, 92)
)

df looks like this

enter image description here

The output should look something like this

enter image description here

Since I want to retain only the rows having the | character and the ones not having the # character and the blank cells, maybe pattern matching could help. But I can't seem to figure out how to implement it in the code. Any suggestions?

Please note that in the original dataframe, the hits can have different names. Some hits don't start with the term 'Hit'

like image 280
accibio Avatar asked Oct 15 '25 07:10

accibio


2 Answers

We could use grep with a regex pattern - "^[^#]+$|\\| (Hits column with no # from start (^) to end ($) or (|) the | character - escaped as it is metacharacter)

subset(df, grepl("^[^#]+$|\\|", Hits)|Hits == "")
              Hits Category1 Category2 Category3
2   # ID|987129470        NA        NA        NA
4             Hit1     0.001       100       100
5                      0.001       100       100
6             Hit2     0.002        99        99
7             Hit3     0.003        98        98
8                      0.003        98        98
10 # ID|6971324987        NA        NA        NA
12            Hit1     0.023       100        98
13            Hit2     0.341        95        97
14                     0.341        95        97
15            Hit3     0.569        97        92

Or do this in single grepl

> subset(df, grepl("^$|^[^#]+$|\\|", Hits))
              Hits Category1 Category2 Category3
2   # ID|987129470        NA        NA        NA
4             Hit1     0.001       100       100
5                      0.001       100       100
6             Hit2     0.002        99        99
7             Hit3     0.003        98        98
8                      0.003        98        98
10 # ID|6971324987        NA        NA        NA
12            Hit1     0.023       100        98
13            Hit2     0.341        95        97
14                     0.341        95        97
15            Hit3     0.569        97        92

or in dplyr

library(dplyr)
library(stringr)
df %>%
   filter(str_detect(Hits, "^[^#]+$|\\|")|Hits == "")
             Hits Category1 Category2 Category3
1   # ID|987129470        NA        NA        NA
2             Hit1     0.001       100       100
3                      0.001       100       100
4             Hit2     0.002        99        99
5             Hit3     0.003        98        98
6                      0.003        98        98
7  # ID|6971324987        NA        NA        NA
8             Hit1     0.023       100        98
9             Hit2     0.341        95        97
10                     0.341        95        97
11            Hit3     0.569        97        92
like image 86
akrun Avatar answered Oct 17 '25 21:10

akrun


Similar to akruns solution but using no regex solution:

library(dplyr)
library(stringr)
df %>%
  filter(str_detect(Hits, 'ID|Hit'))
              Hits Category1 Category2 Category3
1   # ID|987129470        NA        NA        NA
2             Hit1     0.001       100       100
3             Hit1     0.001       100       100
4             Hit2     0.002        99        99
5             Hit3     0.003        98        98
6             Hit3     0.003        98        98
7  # ID|6971324987        NA        NA        NA
8             Hit1     0.023       100        98
9             Hit2     0.341        95        97
10            Hit2     0.341        95        97
11            Hit3     0.569        97        92
like image 36
TarJae Avatar answered Oct 17 '25 20:10

TarJae



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!