I am new to R and am trying to read a public Google spreadsheet into an R data frame with numeric columns. My problem seems to be that the exported spreadsheet has commas in large numbers, such as "13,061.422". The read.csv() function treats this as a factor. I tried stringsAsFactors=FALSE and colClasses=c(rep("numeric",7)) but neither worked. Is there a way to coerce the values with commas and decimals to numeric values, either within read.csv() or afterwards when they are treated as Factors in the R dataframe? Here is my code:
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE) #ssl.verifypeer=FALSE gets around certificate issues I don't understand.
fullmatrix <- read.csv(textConnection(myCsv))
str(fullmatrix)
which results in:
'data.frame': 4927 obs. of 7 variables:
$ wave. : Factor w/ 4927 levels "1,000.8900","1,002.8190",..: 4875 4874 4873 4872 4871 4870 4869 4868 4867 4866 ...
$ wavelength : Factor w/ 4927 levels "1,000.074","1,000.267",..: 1 2 3 4 5 6 7 8 9 10 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
Thanks for any help! I am new to R, so guessing (hoping) this is an easy one!
Yes. Two methods. The easiest to understand at first is probably just to is as.is=TRUE to preserve them as character vectors and then use gsub to remove the commas and any currency symbols before converting to numeric. The second is a bit more difficult, but I think more kewl. Create an as-method for the format you are using. Then you can use colClasses to do it in one step.
I see @EDi already did version #1 (using stringsAsFactors rather than as.is, so I will document strategy #2:
library(methods)
setClass("num.with.commas")
#[1] "num.with.commas"
setAs("character", "num.with.commas",
function(from) as.numeric(gsub(",", "", from)))
require(RCurl)
#Loading required package: RCurl
#Loading required package: bitops
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Agbdciapt4QZdE95UDFoNHlyNnl6aGlqbGF0cDIzTlE&single=true&gid=0&range=A1%3AG4928&output=csv", ssl.verifypeer=FALSE)
> fullmatrix <- read.csv(textConnection(myCsv),
colClasses=c(rep("num.with.commas",2), rep("numeric",4) ))
str(fullmatrix)
#--------------
'data.frame': 4927 obs. of 7 variables:
$ wave. : num 9999 9997 9995 9993 9992 ...
$ wavelength : num 1000 1000 1000 1001 1001 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
as-methods are coercive. There are many such methods in base R, such as as.list, as.numeric, as.character. In each case they attempt to take input that is in one mode and make a sensible copy of that in a different mode. For instance, it makes sense to coerce a matrix to a dataframe because they both have two dimensions. It makes a bit less sense to coerce a dataframe to a matrix (but it does succeed with loss of all the attributes of the columns and coercion to a common mode.)
In the present case I am taking a character string as input, removing any commas, and coercing the character values to numeric. Then I use read.table's ( in this case by way of read.csv) 'colClasses' argument to dispatch to the as-method I registered with setAs. You may want to go to the help(setAs) page for more details. The S4 class system confuses a lot of people, me included. This is about the only area of success I have had with S4 methods.
Read the data with stringsAsFactors = FALSE in, remove the commas (with gsub()) and convert to numeric (with as.numeric()):
> fullmatrix <- read.csv(textConnection(myCsv), stringsAsFactors = FALSE)
> str(fullmatrix)
'data.frame': 4927 obs. of 7 variables:
$ wave. : chr "9,999.2590" "9,997.3300" "9,995.4010" "9,993.4730" ...
$ wavelength : chr "1,000.07410549122" "1,000.26707130804" "1,000.46011160533" "1,000.65312629553" ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
> fullmatrix$wave. <- as.numeric(gsub(",", "", fullmatrix$wave.))
> fullmatrix$wavelength <- as.numeric(gsub(",", "", fullmatrix$wavelength))
> str(fullmatrix)
'data.frame': 4927 obs. of 7 variables:
$ wave. : num 9999 9997 9995 9993 9992 ...
$ wavelength : num 1000 1000 1000 1001 1001 ...
$ d2o : num 85.2 87.7 86.3 87.6 85.6 ...
$ di : num 54.3 55.8 54.9 55.6 54.9 ...
$ ddw : num 48.2 49.7 49.4 50.2 49.6 ...
$ ddw.old : num 53.3 55 53.9 54.8 53.7 ...
$ d2o.ddw.mix: num 65.8 67.9 67.2 68.4 66.8 ...
> fullmatrix[1, 1]
[1] 9999.259
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With