Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting dataframe directly from JSON-file?

Tags:

json

r

First, let me thank everybody who contributes to Stackoverflow and R! I'm one of those R-users who is not so good at programming, but bravely try to use it for work, so the issue below is probably trivial...

Here's the problem. I need to import files in JSON-format to R:

# library(plyr)
# library(RJSONIO)
# lstJson <- fromJSON("JSON_test.json")        #This is the file I read
# dput(lstJson)                                              #What I did to get the txtJson below, for the benefit of testing.

txtJson <- structure(list(version = "1.1", result = structure(list(warnings = structure(list(), class = "AsIs"), fields = list(structure(list(info = "", rpl = 15, name = "time", type = "timeperiod"), .Names = c("info", "rpl", "name", "type")), structure(list(info = "", name = "object", type = "string"), .Names = c("info", "name", "type")), structure(list(info = "Counter1", name = "Counter1", type = "int"), .Names = c("info", "name", "type")), structure(list( info = "Counter2", name = "Counter2", type = "int"), .Names = c("info", "name", "type"))), timeout = 180, name = NULL, data = list( list(list("2011-05-01 17:00", NULL), list("Total", NULL), list(8051, NULL), list(44, NULL)), list(list("2011-05-01 17:15", NULL), list("Total", NULL), list(8362, NULL), list( 66, NULL))), type = "AbcDataSet"), .Names = c("warnings", "fields", "timeout", "name", "data", "type"))), .Names = c("version", "result"))

dfJson <- ldply(txtJson, data.frame)  

What I need is a data frame similar to this:

time  object  Counter1  Counter2  
2011-05-01 17:00  Total  8051  44  
2011-05-01 17:15  Total  8362  66 

But instead I get

"Error in data.frame("2011-05-01 17:00", NULL, check.names = FALSE, stringsAsFactors = TRUE) : 
  arguments imply differing number of rows: 1, 0"

I get the same error if I use the lstJson.

I'm not sure if RJSONIO is supposed to be "smart enough" to parse files like this, or if I have to manually read the first line of the file, set column-types etc. The reason I'm not using CSV is that I want to "automatically" get dates in date-format, etc.

Thanks, /Chris

like image 250
Chris Avatar asked Nov 29 '25 22:11

Chris


1 Answers

Looking at the structure of txtJson you see that all of the useful bits are in txtJson$result$data:

> sapply( txtJson$result$data, unlist )
     [,1]               [,2]              
[1,] "2011-05-01 17:00" "2011-05-01 17:15"
[2,] "Total"            "Total"           
[3,] "8051"             "8362"            
[4,] "44"               "66"              
> t(sapply( txtJson$result$data, unlist ))
     [,1]               [,2]    [,3]   [,4]
[1,] "2011-05-01 17:00" "Total" "8051" "44"
[2,] "2011-05-01 17:15" "Total" "8362" "66"
> as.data.frame(t(sapply( txtJson$result$data, unlist )) )
                V1    V2   V3 V4
1 2011-05-01 17:00 Total 8051 44
2 2011-05-01 17:15 Total 8362 66

In the process of gettting these as unlisted vectors and then passing to 'as.data.frame' they are now all class 'factor', so there is probably additional effort to re-class() these values. You can instead use:

data.frame(t(sapply( txtJson$result$data, unlist )) ,stringsAsFactors=FALSE)

And they would all be 'character'

As far as importing CSV files, read.table()'s colClasses argument will accept "POSIXlt" or "POSIXct" as known types. The rule I believe is that there must an as._ method available. Here's a minimal example:

> read.table(textConnection("2011-05-01 17:00"), sep=",", colClasses="POSIXct")
                   V1
1 2011-05-01 17:00:00
like image 176
IRTFM Avatar answered Dec 02 '25 14:12

IRTFM



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!