apache-spark tutorials and guides

Is reading a CSV file from S3 into a Spark dataframe expected to be so slow?

Nov 19, 2022

apache-spark amazon-s3

How to set a custom environment variable in EMR to be available for a spark Application

Nov 06, 2022

amazon-web-services hadoop apache-spark environment-variables emr

How to list all tables in database using Spark SQL?

Oct 18, 2022

apache-spark pyspark apache-spark-sql

Spark Streaming: Micro batches Parallel Execution

Sep 15, 2022

hadoop apache-spark apache-kafka spark-streaming

Spark Structured Streaming Checkpoint Cleanup

Aug 20, 2022

apache-spark spark-structured-streaming

Collect rows as list with group by apache spark

Mar 22, 2022

java scala apache-spark apache-spark-sql spark-streaming

How to query to mongo using spark?

Mar 15, 2022

mongodb scala apache-spark

What is "Hadoop" - the definition of Hadoop?

May 13, 2018

hadoop hbase hdfs apache-spark hadoop-yarn

spark - filter within map

Jul 22, 2017

java apache-spark

How to create InputDStream with offsets in PySpark (using KafkaUtils.createDirectStream)?

Oct 26, 2022

apache-spark apache-kafka pyspark

Batched API call inside apache spark?

Apr 12, 2022

apache-spark

Spark SQL is not converting timezone correctly [duplicate]

Feb 04, 2022

scala apache-spark hive timezone

What's the difference between explode function and operator?

Sep 05, 2022

apache-spark apache-spark-sql

What to do with "WARN TaskSetManager: Stage contains a task of very large size"?

Dec 26, 2020

apache-spark apache-spark-1.6

Delta Lake rollback

Nov 04, 2022

apache-spark rollback databricks delta-lake

How does Spark achieve parallelism within one task on multi-core or hyper-threaded machines

Nov 06, 2022

multithreading apache-spark parallel-processing multicore

Pyspark Dataframe group by filtering

May 31, 2019

python apache-spark pyspark apache-spark-sql

Spark Dataframe Random UUID changes after every transformation/action

Apr 18, 2022

scala apache-spark dataframe uuid

How to run Scala script using spark-submit (similarly to Python script)?

Jul 01, 2021

scala apache-spark

Aggregate rows of Spark DataFrame to String after groupby

Aug 19, 2022

scala apache-spark dataframe

New posts in apache-spark