apache-spark tutorials and guides

Garbage collection time very high in spark application causing program halt

Feb 10, 2023

scala apache-spark garbage-collection

How to use paste mode in pyspark shell?

Feb 11, 2023

python apache-spark pyspark

AWS EMR Spark save to S3 is very slow

Feb 10, 2023

amazon-s3 apache-spark emr

Object not serializable error on org.apache.avro.generic.GenericData$Record

Feb 10, 2023

apache-spark

Scala - Operation in case (x,y)=> x++y

Feb 09, 2023

scala apache-spark

value toDF is not a member of org.apache.spark.rdd.RDD

Feb 10, 2023

scala apache-spark apache-spark-sql

spark-shell dependencies, translate from sbt

Feb 09, 2023

scala apache-spark cassandra sbt

Spark Scala GraphX: Shortest path between two vertices

Feb 09, 2023

scala apache-spark spark-graphx

Why join and group by affects the amount of data shuffle in spark

Feb 09, 2023

hadoop apache-spark

Spark - Strange behaviour with iterative algorithms

Feb 09, 2023

algorithm apache-spark iteration

Can't import sqlContext.implicits._ without an error through Jupyter

Feb 09, 2023

scala apache-spark amazon-ec2 apache-spark-sql jupyter

Apache Spark running out of memory with smaller amount of partitions

Feb 09, 2023

apache-spark

How to solve "Exception in thread "main" org.apache.spark.SparkException: Application application finished with failed status"?

Feb 09, 2023

apache-spark spark-streaming

Define spark udf by reflection on a String

Feb 09, 2023

scala apache-spark spark-dataframe udf scala-reflect

How to filter data using window functions in spark

Feb 09, 2023

scala apache-spark spark-dataframe window-functions

Why does SparkSession execute twice for one action?

Feb 08, 2023

java apache-spark apache-spark-sql

Spark: Removing rows which occur less than N times

Feb 09, 2023

apache-spark pyspark

NullPointerException in Spark RDD map when submitted as a spark job

Feb 08, 2023

scala hadoop apache-spark distributed-computing bigdata

Why extracting an argument in spark to local variable is considered safer?

Feb 09, 2023

scala function apache-spark distributed-computing bigdata

Transformation process in Apache Spark

Feb 09, 2023

apache-spark rdd

New posts in apache-spark