I have Maven dependencies spark-sql_2.1.0and spark-hive_2.1.0. However, when I am trying to import org.apache.spark.sql.DataFrame, there is an error. But importing
org.apache.spark.sql.SQLContext is OK, there is no errors. Why?
DataFrame has become a type DataFrame = Dataset[Row] in Spark 2.x. Java doesn't have type aliases, so it's not available in Java. You should now use the new type Dataset<Row>, so import both org.apache.spark.sql.Dataset and org.apache.spark.sql.Row
import org.apache.spark.sql.DataFrame
works for scala and not for java as there is no library developed for java. You can use dataSet as explained in Spark SQL, DataFrames and Datasets Guide
You can import the following
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
and use them as
Dataset<Row> peopleDataFrame = spark.createDataFrame(rowRDD, schema);
Or
Dataset<Row> peopleDF = spark.createDataFrame(peopleRDD, Person.class);
Or
Dataset<Row> usersDF = spark.read().load("examples/src/main/resources/users.parquet");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With