apache-spark tutorials and guides

"No Filesystem for Scheme: gs" when running spark job locally

Jun 14, 2022

Running Spark jobs on a YARN cluster with additional files

Oct 30, 2022

apache-spark hdfs hadoop-yarn

Append a new column to an existing parquet file

Oct 17, 2022

apache-spark apache-spark-sql parquet

Spark reading python3 pickle as input

Nov 18, 2022

python apache-spark serialization pyspark rdd

Why do columns change to nullable in Apache Spark SQL?

Oct 22, 2022

apache-spark apache-spark-sql apache-spark-dataset

Save and load two ML models in pyspark

Apr 04, 2022

python apache-spark pyspark apache-spark-ml

Spark Structured streaming: multiple sinks

Sep 26, 2022

apache-spark spark-structured-streaming

Spark, Alternative to Fat Jar

Dec 11, 2019

java scala apache-spark gradle amazon-emr

Extract words from a string column in spark dataframe

Feb 21, 2022

regex scala apache-spark apache-spark-sql

SQL over Spark Streaming

Oct 03, 2022

apache-spark spark-streaming

Get current task ID in Spark in Java

Oct 24, 2022

java apache-spark

Can I use Spark without Hadoop for development environment?

Nov 15, 2022

hadoop apache-spark filesystems

spark.ml StringIndexer throws 'Unseen label' on fit()

Oct 21, 2022

apache-spark dataframe pyspark apache-spark-sql apache-spark-ml

Scala - why Double consume less memory than Floats in this case?

Mar 31, 2022

scala memory apache-spark scala-collections

Filtering rows based on column values in spark dataframe scala

Feb 09, 2022

scala apache-spark dataframe apache-spark-sql

How to add a column to Dataset without converting from a DataFrame and accessing it?

Oct 19, 2022

scala apache-spark

AWS Glue write parquet with partitions

Feb 26, 2022

amazon-web-services apache-spark pyspark aws-glue

pyspark partitioning data using partitionby

Oct 14, 2022

python apache-spark pyspark partitioning rdd

Default number of executors and cores for spark-shell

Oct 30, 2022

apache-spark

How to calculate Percentile of column in a DataFrame in spark?

Apr 13, 2022

scala apache-spark apache-spark-sql spark-dataframe

New posts in apache-spark