apache-spark tutorials and guides

Select columns that satisfy a condition

Sep 14, 2022

How to convert unix timestamp to the given timezone with Spark

Feb 16, 2022

scala apache-spark apache-spark-sql timezone unix-timestamp

Why does spark-ml ALS model returns NaN and negative numbers predictions?

Sep 15, 2021

apache-spark pyspark apache-spark-mllib

Apply custom function to cells of selected columns of a data frame in PySpark

May 05, 2021

python apache-spark pyspark spark-dataframe

Spark SQL - reading csv with schema

Sep 05, 2022

scala validation csv apache-spark schema

Combine multiple raw files into single parquet file

Dec 10, 2021

apache-spark pyspark etl aws-glue

Spark writing/reading to/from S3 - Partition Size and Compression

Sep 16, 2022

amazon-web-services apache-spark amazon-s3 gzip

Authentication for Spark standalone cluster

May 01, 2022

security hadoop authentication apache-spark pyspark

split a Spark column of Array[String] into columns of String

Nov 03, 2022

arrays string apache-spark split

Pickling monkey-patched Keras model for use in PySpark

Jun 20, 2022

apache-spark pyspark keras pickle monkeypatching

Retain raw JSON as column in Spark DataFrame on read/load?

Aug 25, 2022

json apache-spark apache-spark-sql

Why do I get so many empty partitions when repartionning a Spark Dataframe?

Nov 18, 2022

apache-spark pyspark apache-spark-sql partitioning

Apache Spark vs Spring Cloud data flow [closed]

Aug 27, 2022

apache-spark spring-cloud-dataflow

Error running spark on databricks: constructor public XXX is not whitelisted

Nov 02, 2022

apache-spark pyspark databricks

Pass additional arguments to foreachBatch in pyspark

May 31, 2022

apache-spark pyspark spark-structured-streaming databricks

How to remove elements from an array Column in Spark?

Sep 16, 2022

arrays scala apache-spark dataframe seq

Is a Spark RDD deterministic for the set of elements in each partition?

Sep 14, 2022

apache-spark persistence rdd

Spark SQL - Regex for matching only numbers

Nov 10, 2022

regex dataframe apache-spark pyspark apache-spark-sql

Spark window partition function taking forever to complete

Sep 14, 2022

scala performance dataframe apache-spark apache-spark-sql

Why does Spark report spark.SparkException: File ./someJar.jar exists and does not match contents of

Apr 12, 2022

apache-spark

New posts in apache-spark