Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode Variable Names in R

I was working on a toy project and tried using some unicode variable names to match a paper I was attempting to implement.

The following code works fine on R 3.4.3 on Windows (RStudio version 1.1.456) and R 3.5.1 on OSX:

> µ  <- function(ß,  n) ß  *  n 
> µ(2, 3)
[1] 6

This code gives the following error, with α typed as ALT+224:

> α <- 2
Error: unexpected input in "\"

The file was saved as UTF-8, so this is surprising to me.

make.names is consistent with the results above:

> make.names('µ')
[1] "µ"
> make.names('α')
[1] "a"

What is the rule for non-ASCII letters, why are mu and scharfes OK but alpha isn't?

Edit: Output of sessionInfo()

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.3 tools_3.4.3    yaml_2.2.0 

Edit2: It seems like Sys.setlocale should be the answer, but here is what happens when I try this:

> Sys.setlocale("LC_ALL", 'en_US.UTF-8')
[1] ""
Warning message:
In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
  OS reports request to set locale to "en_US.UTF-8" cannot be honored
like image 965
Josh Rumbut Avatar asked Oct 26 '25 13:10

Josh Rumbut


1 Answers

Working with Ben Bolker we determined the issue was that the current session was using character encoding Windows-1252, which has some non-ASCII characters but not many. This is despite the fact that RStudio saved the file as UTF-8.

Attempting to change the current collation of a running R session does not seem to be possible? At least on Windows I get a warning (see the question and here).

I have a partial solution, if someone finds themselves in the situation where they are given a file like this and want to run it and have interactive access to the results, the following will mostly work (variables will be translated to Win-1252):

> source('utf-8-file.r', encoding='UTF-8')

I would be very excited to see a better solution, one which allows editing and running the file and entering such snippets into the console of RStudio on Windows.

like image 122
Josh Rumbut Avatar answered Oct 28 '25 01:10

Josh Rumbut