Retain rows in an R dataframe which contain a specific pattern in a column

Question

df is a test dataframe in which I want to retain only the rows which contain the | character in the Hits column and the rows which contain the hits (i.e. Hit1, Hit2 etc.) and also the blank cells in the Hits column.

df <- data.frame(
 Hits = c("# test114", "# ID|987129470", "# 13135", "Hit1", "", "Hit2", "Hit3", "", "# test739", "# ID|6971324987", "# 83771", "Hit1", "Hit2", "", "Hit3"),
 Category1 = c(NA, NA, NA, 0.001, 0.001, 0.002, 0.003, 0.003, NA, NA, NA, 0.023, 0.341, 0.341, 0.569),
 Category2 = c(NA, NA, NA, 100, 100, 99, 98, 98, NA, NA, NA, 100, 95, 95, 97),
 Category3 = c(NA, NA, NA, 100, 100, 99, 98, 98, NA, NA, NA, 98, 97, 97, 92)
)

df looks like this

enter image description here

The output should look something like this

enter image description here

Since I want to retain only the rows having the | character and the ones not having the # character and the blank cells, maybe pattern matching could help. But I can't seem to figure out how to implement it in the code. Any suggestions?

Please note that in the original dataframe, the hits can have different names. Some hits don't start with the term 'Hit'

akrun · Accepted Answer

We could use grep with a regex pattern - "^[^#]+$|\| (Hits column with no # from start (^) to end ($) or (|) the | character - escaped as it is metacharacter)

subset(df, grepl("^[^#]+$|\|", Hits)|Hits == "")
              Hits Category1 Category2 Category3
2   # ID|987129470        NA        NA        NA
4             Hit1     0.001       100       100
5                      0.001       100       100
6             Hit2     0.002        99        99
7             Hit3     0.003        98        98
8                      0.003        98        98
10 # ID|6971324987        NA        NA        NA
12            Hit1     0.023       100        98
13            Hit2     0.341        95        97
14                     0.341        95        97
15            Hit3     0.569        97        92

Or do this in single grepl

> subset(df, grepl("^$|^[^#]+$|\|", Hits))
              Hits Category1 Category2 Category3
2   # ID|987129470        NA        NA        NA
4             Hit1     0.001       100       100
5                      0.001       100       100
6             Hit2     0.002        99        99
7             Hit3     0.003        98        98
8                      0.003        98        98
10 # ID|6971324987        NA        NA        NA
12            Hit1     0.023       100        98
13            Hit2     0.341        95        97
14                     0.341        95        97
15            Hit3     0.569        97        92

or in dplyr

library(dplyr)
library(stringr)
df %>%
   filter(str_detect(Hits, "^[^#]+$|\|")|Hits == "")
             Hits Category1 Category2 Category3
1   # ID|987129470        NA        NA        NA
2             Hit1     0.001       100       100
3                      0.001       100       100
4             Hit2     0.002        99        99
5             Hit3     0.003        98        98
6                      0.003        98        98
7  # ID|6971324987        NA        NA        NA
8             Hit1     0.023       100        98
9             Hit2     0.341        95        97
10                     0.341        95        97
11            Hit3     0.569        97        92

TarJae · Answer

Similar to akruns solution but using no regex solution:

library(dplyr)
library(stringr)
df %>%
  filter(str_detect(Hits, 'ID|Hit'))

              Hits Category1 Category2 Category3
1   # ID|987129470        NA        NA        NA
2             Hit1     0.001       100       100
3             Hit1     0.001       100       100
4             Hit2     0.002        99        99
5             Hit3     0.003        98        98
6             Hit3     0.003        98        98
7  # ID|6971324987        NA        NA        NA
8             Hit1     0.023       100        98
9             Hit2     0.341        95        97
10            Hit2     0.341        95        97
11            Hit3     0.569        97        92

Retain rows in an R dataframe which contain a specific pattern in a column

Tags:

dataframe

r

accibio

2 Answers

akrun

TarJae

Recent Activity

Donate For Us

Retain rows in an R dataframe which contain a specific pattern in a column

Tags:

dataframe

r

accibio

2 Answers

akrun

TarJae

Related questions

Recent Activity

Donate For Us