apache-spark tutorials and guides

How to insert spark structured streaming DataFrame to Hive external table/location?

Nov 07, 2022

apache-spark hive spark-structured-streaming

Spark (Scala) filter array of structs without explode

Feb 18, 2022

scala apache-spark

Pure Java/Scala code for writing Tensorflow TFRecords data file

Oct 27, 2022

java scala apache-spark guava tensorflow

Spark: saveAsTextFile without compression

Oct 18, 2022

scala apache-spark compression

Encode an ADT / sealed trait hierarchy into Spark DataSet column

Apr 18, 2022

scala apache-spark apache-spark-dataset apache-spark-encoders

where does df.cache() is stored

Oct 04, 2022

apache-spark apache-spark-sql

How to set up Spark with Zookeeper for HA?

Aug 31, 2022

apache-spark apache-zookeeper

Error in running job on Spark 1.4.0 with Jackson module with ScalaObjectMapper

Aug 18, 2019

java scala intellij-idea apache-spark jackson

Is reading a CSV file from S3 into a Spark dataframe expected to be so slow?

Nov 19, 2022

apache-spark amazon-s3

How to set a custom environment variable in EMR to be available for a spark Application

Nov 06, 2022

amazon-web-services hadoop apache-spark environment-variables emr

How to list all tables in database using Spark SQL?

Oct 18, 2022

apache-spark pyspark apache-spark-sql

Spark Streaming: Micro batches Parallel Execution

Sep 15, 2022

hadoop apache-spark apache-kafka spark-streaming

Spark Structured Streaming Checkpoint Cleanup

Aug 20, 2022

apache-spark spark-structured-streaming

Collect rows as list with group by apache spark

Mar 22, 2022

java scala apache-spark apache-spark-sql spark-streaming

How to query to mongo using spark?

Mar 15, 2022

mongodb scala apache-spark

What is "Hadoop" - the definition of Hadoop?

May 13, 2018

hadoop hbase hdfs apache-spark hadoop-yarn

spark - filter within map

Jul 22, 2017

java apache-spark

How to create InputDStream with offsets in PySpark (using KafkaUtils.createDirectStream)?

Oct 26, 2022

apache-spark apache-kafka pyspark

Batched API call inside apache spark?

Apr 12, 2022

apache-spark

Spark SQL is not converting timezone correctly [duplicate]

Feb 04, 2022

scala apache-spark hive timezone

New posts in apache-spark