apache-spark tutorials and guides

how to change column value in spark sql

Sep 05, 2022

How to write streaming dataset to Kafka?

Mar 08, 2022

apache-spark apache-kafka spark-structured-streaming

Kafka with Spark 2.1 Structured Streaming - cannot deserialize

Oct 24, 2022

apache-spark pyspark deserialization apache-spark-sql spark-streaming

I am getting an error while creating a simple RDD in Spark

Jan 31, 2022

python apache-spark rdd

Spark Pipeline error

Jun 18, 2021

python apache-spark pyspark pyspark-sql

spring autoconfiguration class is missing in META-INF/spring.factories

Feb 18, 2022

java spring maven apache-spark

NoClassDefFoundError: Could not initialize XXX class after deploying on spark standalone cluster

Oct 20, 2022

scala apache-spark deployment spark-streaming spark-submit

How to cache partitioned dataset and use in multiple queries?

Jun 20, 2022

java apache-spark apache-spark-sql

Pyspark udf high memory utilization

Nov 04, 2022

apache-spark pyspark

Enum equivalent in Spark Dataframe/Parquet

May 12, 2022

apache-spark parquet

Cumulative distinct count with Spark SQL

Nov 08, 2022

sql apache-spark apache-spark-sql

pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuild in windows 10

Sep 06, 2022

apache-spark pyspark

How handle categorical features in the latest Random Forest in Spark?

Sep 03, 2022

apache-spark apache-spark-mllib random-forest apache-spark-ml feature-engineering

Why is difference between sqlContext.read.load and sqlContext.read.text?

Sep 15, 2022

apache-spark pyspark apache-spark-sql spark-csv

Which would be a quicker (and better) tool for querying data stored in the Parquet format - Spark SQL, Athena or ElasticSearch?

Aug 21, 2022

performance apache-spark elasticsearch etl amazon-athena

How does Serialized RDD occupy less space in memory?

Feb 22, 2022

java apache-spark serialization

Error: Could not write class iw because it exceeds JVM code size limits. Method code too large

May 25, 2019

scala apache-spark apache-spark-sql

Scala: How to combine two data frames?

Aug 19, 2022

scala apache-spark apache-spark-sql

How to implement `except` in Apache Spark based on subset of columns?

Sep 15, 2022

scala apache-spark apache-spark-sql

how to convert a timestamp into string (without changing timezone)?

Jul 14, 2022

r apache-spark hive timestamp sparklyr

New posts in apache-spark