Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

org.apache.spark.SparkException: Task not serializable - JavaSparkContext

Spark DataFrame created from JavaRDD<Row> copies all columns data into first column

"unbound method textFile() must be called with SparkContext instance as first argument (got str instance instead)"

python apache-spark pyspark

How to use spark Naive Bayes classifier for text classification with IDF?

How to avoid "Not a file" exceptions when reading from HDFS with spark

apache-spark hdfs emr s3distcp

Understanding closures and parallelism in Spark

scala hadoop apache-spark

ClassNotFoundException thrown launching Spark Shell

apache-spark pyspark

accumulator of Spark is confusing me.

scala apache-spark

Spark: group concat equivalent in scala rdd

When are files "splittable"?

using Word2VecModel.transform() does not work in map function

In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

java scala apache-spark

Broadcast not happening while joining dataframes in Spark 1.6

How to drop rows with too many NULL values?

In Spark SQL, how do you register and use a generic UDF?

scala apache-spark udf

spark RDD sort by two values

scala sorting apache-spark rdd

Using spark dataFrame to load data from HDFS

How to view the logs of a spark job after it has completed and the context is closed?

Reading Json file using Apache Spark

Pyspark : Custom window function