I have an avro file which I am reading as follows:
avroFile <-read.df(sqlContext, "avro", "com.databricks.spark.avro")
This file as lat/lon columns but I am not able to plot them like a regular dataframe. Neither am I able to access the column using the '$' operator.
ex.
avroFile$latitude
Any help regarding avro files and operation on them using R are appreciated.
If you want to use ggplot2 for plotting, try ggplot2.SparkR. This package allows you to take SparkR DataFrame directly as input for ggplot() function call.
https://github.com/SKKU-SKT/ggplot2.SparkR
And you won't be able to plot it directly. SparkR DataFrame is not compatible with functions which expect data.frame as an input. This is not even a data structure in a strict sense but simply a recipe how to process input data. It is materialized only when you execute an action.
If you want to plot it you'll have collect it first.. Beware that it fetches all the data the local machine so typically it is something you want to avoid on full data set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With