Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot data from SparkR DataFrame

I have an avro file which I am reading as follows:

avroFile <-read.df(sqlContext, "avro", "com.databricks.spark.avro")

This file as lat/lon columns but I am not able to plot them like a regular dataframe. Neither am I able to access the column using the '$' operator.

ex.

avroFile$latitude

Any help regarding avro files and operation on them using R are appreciated.

like image 315
Vishal R Avatar asked Mar 15 '26 08:03

Vishal R


2 Answers

If you want to use ggplot2 for plotting, try ggplot2.SparkR. This package allows you to take SparkR DataFrame directly as input for ggplot() function call.

https://github.com/SKKU-SKT/ggplot2.SparkR

like image 50
Jae Avatar answered Mar 17 '26 21:03

Jae


And you won't be able to plot it directly. SparkR DataFrame is not compatible with functions which expect data.frame as an input. This is not even a data structure in a strict sense but simply a recipe how to process input data. It is materialized only when you execute an action.

If you want to plot it you'll have collect it first.. Beware that it fetches all the data the local machine so typically it is something you want to avoid on full data set.

like image 26
zero323 Avatar answered Mar 17 '26 23:03

zero323