How to determine if character string contains non-Roman characters in R

Question

What is the preferred way of determining if a string contains non-Roman/non-English (e.g., ないでさ) characters?

IRTFM · Accepted Answer

You could use regex/grep to check for hex values of characters outside the range of printable ASCII characters:

x <- 'ないでさ'
grep( "[^\x20-\x7F]",x )
#[1] 1
grep( "[^\x20-\x7F]","Normal text" )
#integer(0)

If you wanted to allow the non-printing ("control") character to be considered "English", you could extend the range of the character class in hte first argument to grep to start with "\x01". See ?regex for more information on using character class argumets. See ?Quotes for more information about how to specify characters as Unicode, hexadecimal, or octal values.

The R.oo package has conversion functions that may be useful:

library(R.oo)
?intToChar
?charToInt

The fact that Henrik Bengtsson saw fit to include these in his package says to me that there is no a handy method to do this in base/default R. He's a long-time useR/guRu.

Seeing the other answer prompted this effort which seems straight-forward:

> is.na( iconv( c(x, "OrdinaryASCII") , "", "ASCII") )
[1]  TRUE FALSE

How to determine if character string contains non-Roman characters in R

Tags:

string

regex

r

Brandon Loudermilk

1 Answers

IRTFM

Recent Activity

Donate For Us

How to determine if character string contains non-Roman characters in R

Tags:

string

regex

r

Brandon Loudermilk

1 Answers

IRTFM

Related questions

Recent Activity

Donate For Us