Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove NA strings from table (characters) in R

Tags:

regex

r

gsub

How can I remove NA strings in a simple data frame like the one below, which consists of a single column, in R?

head(test)
Column1 
[1] "Gene1 Gene2 Gene3 NA NA NA NA" 
[2] "Gene41 NAGene218 GeneX NA"
[3] "Gene19 GeneNA NA NA NA NA NA"

Some genes start or end with 'NA', so to avoid getting rid of those NAs, the gsub regex has to specify the position of the NA in the string... Something like: test2 <- gsub('^ NA$', "", test$Column1), with ^ indicating that ' NA' has to be at the start and $ at the end of the string... I am sure it's something simple, but I don't understand what I am doing wrong? (As I am not very familiar with these regex symbols)

[UPDATE] - Desired output

head(test2)
Column1 
[1] "Gene1 Gene2 Gene3" 
[2] "Gene41 NAGene218 GeneX"
[3] "Gene19 GeneNA"
like image 722
Rodrigo Duarte Avatar asked Jan 19 '26 03:01

Rodrigo Duarte


1 Answers

You may use

test$Column1 <- gsub("^NA(?:\\s+NA)*\\b\\s*|\\s*\\bNA(?:\\s+NA)*$", "", test$Column1)

See the regex demo

Details

  • ^NA(?:\s+NA)*\b\s* - Alternative 1:
    • ^ - start of string
    • NA - NA string
    • (?:\s+NA)* - 0 or more repetitions of 1+ whitespaces and NA text
    • \b - make sure there is a word boundary (no NAGene match should occur)
    • \s* - 0+ whitespaces
  • | - or
  • \s*\bNA(?:\s+NA)*$ - Alternative 2:
    • \s* - 0+ whitespaces
    • \b - make sure there is a word boundary (no GeneNA match should occur)
    • NA - NA string
    • (?:\s+NA)* - 0 or more repetitions of 1+ whitespaces and NA text
    • $ - end of string.
like image 73
Wiktor Stribiżew Avatar answered Jan 21 '26 19:01

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!