Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Spark maven dependency breaks down sprint-boot application

NoClassDefFoundError for joda DateTimeFormat

How to create a PySpark Schema for a list of tuples?

apache-spark pyspark schema

Databricks SQL - CTE namespace (bug?) with temporary views

How to strip headers from all files in RDD, where RDD = sc.textFile("s3n://bucket/*.csv")?

Spark LuceneRDD - how does it work

Why does collecting dataset fail with org.apache.spark.shuffle.FetchFailedException?

Using windowing functions in Spark

How to insert (not save or update) RDD into Cassandra?

cassandra apache-spark

Unable to load 25GB dataset in PySpark local mode with 56GB RAM free

How to load history data when starting Spark Streaming process, and calculate running aggregations

Linear regression with Spark MLlib only returns monotonic predictions

What is appName in SparkContext constructor and what is the usage of it?

hadoop apache-spark

How can I configure spark-submit (or DataProc) to download maven dependencies (jars) from GitHub packages?