apache-spark tutorials and guides

Spark-Scala Malformed Line Issue

Feb 20, 2026

scala apache-spark malformed

Firehose JSON -> S3 Parquet -> ETL Spark, error: Unable to infer schema for Parquet

Feb 19, 2026

apache-spark pyspark parquet amazon-kinesis aws-glue

Remove Vertices with no outgoing edges in GraphX

Feb 20, 2026

scala apache-spark spark-graphx

How to replace nulls in Vector column?

Feb 20, 2026

scala apache-spark apache-spark-sql apache-spark-1.6

Spark : Scala mocking, Task not serializable

Feb 20, 2026

scala apache-spark intellij-idea mockito spy

How to control file size in Pyspark?

Feb 19, 2026

apache-spark pyspark apache-spark-sql

Error importing MulticlassClassificationEvaluator

Feb 19, 2026

python apache-spark pyspark apache-spark-mllib

Fastest And Effective Way To Iterate Large DataSet in Java Spark

Feb 19, 2026

java apache-spark iteration apache-spark-dataset

guava jar conflict when using ElasticSearch on Spark job

Feb 17, 2026

hadoop elasticsearch apache-spark hadoop-yarn

Spark MLib Decision Trees: Probability of labels by features?

Feb 17, 2026

python apache-spark decision-tree data-science

pyspark get value counts within a groupby

Feb 18, 2026

apache-spark pyspark

spark dataframe save as partitioned table very slowly

Feb 17, 2026

apache-spark

zeppelin notebook "error: not found: value %"

Feb 18, 2026

apache-spark apache-zeppelin

Inserts into Redshift using spark-redshift

Feb 18, 2026

apache-spark amazon-redshift amazon-redshift-spectrum

How to run C algorithm on Spark cluster? [closed]

Feb 18, 2026

c apache-spark distributed-computing

Spark streaming StreamingContext active count

Feb 18, 2026

hadoop apache-spark streaming spark-streaming

Configuring Spark Web-UI with nginx

Feb 18, 2026

nginx apache-spark

New posts in apache-spark