Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete rows that consists of strings

Tags:

regex

r

I have a column that should consist of numbers only but there are characters or other symbols in there as well. R sees the feature Housenumber as a character.

For instance:

Housenumber 
1
14
5
at5
53.!
boat

I was wondering what kind of function I could write to identify the rows that do not consist of numbers only and to delete those?

Housenumber 
1
14
5
like image 485
Veraaa Avatar asked Dec 01 '25 17:12

Veraaa


2 Answers

df[length(grep("[^[:digit:]]", df$Housenumber, value=F)) == 0, ]

Explanation:

The regex [^[:digit:]] will match any non numeric character, e.g. the other characters and symbols which you want to strip.

The call

grep("[^[:digit:]]", df$Housenumber, value=F)

will return a vector containing the first index of your Housenumber column if a match is found. So if a match isn't found, the length of this vector will be zero, and it means you want to keep that row.

In this particular case, I prefer the answer given by @akrun, but my answer also works in the general case of filtering rows using any sort of regex.

like image 158
Tim Biegeleisen Avatar answered Dec 04 '25 06:12

Tim Biegeleisen


This can be done with as.numeric which will convert the non-numeric elements to NA, and we delete those rows with !is.na that gives a logical index.

df1[!is.na(as.numeric(df1$Housenumber)),, drop= FALSE]
like image 43
akrun Avatar answered Dec 04 '25 05:12

akrun