apache-spark tutorials and guides

how to process data in chunks/batches with kafka streams?

Nov 16, 2022

Spark: UDF executed many times

Aug 17, 2022

scala apache-spark apache-spark-sql

Problems when writing parquet with timestamps prior to 1900 in AWS Glue 3.0

Jun 28, 2022

amazon-web-services apache-spark pyspark aws-glue

How do you perform blocking IO in apache spark job?

Aug 17, 2022

scala parallel-processing apache-spark

How to convert matrix to RDD[Vector] in spark

Jan 01, 2017

scala apache-spark

java.lang.NoSuchMethodError Jackson databind and Spark

Sep 12, 2022

json scala jackson apache-spark

Hadoop 2.6 Connecting to ResourceManager at /0.0.0.0:8032

Mar 02, 2022

java hadoop apache-spark resourcemanager

Apply function to each row of Spark DataFrame

Mar 14, 2022

apache-spark apache-spark-sql

Multiple Spark applications with HiveContext

Nov 10, 2022

apache-spark hive pyspark

How to optimize spark sql to run it in parallel

Oct 29, 2022

sql apache-spark parallel-processing apache-spark-sql hadoop-yarn

snakeyaml and spark results in an inability to construct objects

Nov 08, 2019

scala apache-spark snakeyaml

Reading in multiple files compressed in tar.gz archive into Spark [duplicate]

Sep 14, 2022

scala apache-spark gzip rdd

Spark is not using all configured memory

Sep 16, 2022

scala apache-spark bigdata

Why Does Spark Query (Load) from Oracle Is So Slow Comparing to SQOOP?

Nov 09, 2022

oracle apache-spark apache-spark-sql spark-dataframe

Livy Server: return a dataframe as JSON?

Jun 04, 2021

json apache-spark cloudera apache-spark-2.0 livy

Online learning of LDA model in Spark

May 02, 2022

apache-spark machine-learning apache-spark-mllib lda apache-spark-ml

Can Spark read data directly into a nested case class?

Oct 22, 2022

scala apache-spark apache-spark-dataset

Using airflow to run spark streaming jobs?

Sep 25, 2022

apache-spark streaming airflow

Should cache and checkpoint be used together on DataSets? If so, how does this work under the hood?

Sep 15, 2022

apache-spark apache-spark-sql apache-spark-dataset

PySpark; DecimalType multiplication precision loss

Nov 03, 2022

python apache-spark pyspark

New posts in apache-spark