apache-spark tutorials and guides

How do I groupby and concat a list in a Dataframe Spark Scala

Nov 08, 2022

Spark & Scala: saveAsTextFile() exception

Oct 22, 2022

scala apache-spark hadoop apache-spark-sql bigdata

What does pyspark need psutil for? (faced "UserWarning: Please install psutil to have better support with spilling")?

May 22, 2021

python apache-spark pyspark

Spark Structured Streaming MemoryStream + Row + Encoders issue

Jul 21, 2022

scala apache-spark spark-structured-streaming

'CrossValidatorModel' object has no attribute 'featureImportances'

May 04, 2022

apache-spark machine-learning pyspark apache-spark-mllib random-forest

contains pyspark SQL: TypeError: 'Column' object is not callable

Apr 25, 2022

python apache-spark pyspark apache-spark-sql

Writing Spark DataFrame to Hive table through AWS Glue Data Cataloug

May 04, 2022

amazon-web-services apache-spark amazon-s3 aws-glue aws-glue-data-catalog

How to use Pandas UDFs on macOS Mojave? (that fails due to [__NSPlaceholderDictionary initialize] may have been in progress...)

Sep 14, 2022

apache-spark pyspark pyspark-sql pyarrow

How to use gcs-connector and google-cloud-storage alongside in Scala

Apr 18, 2021

scala apache-spark google-cloud-storage

Spark Parquet read error : java.io.EOFException: Reached the end of stream with XXXXX bytes left to read

Jul 19, 2022

apache-spark apache-spark-sql parquet

How to convert a dictionary to dataframe in PySpark?

Sep 09, 2022

python apache-spark pyspark

Spark INLINE Vs. LATERAL VIEW EXPLODE differences?

Aug 22, 2022

sql arrays apache-spark hiveql explode

Using pyspark, how to expand a column containing a variable map to new columns in a DataFrame while keeping other columns?

Jun 22, 2022

apache-spark pyspark apache-spark-sql

Pyspark filter dataframe if column does not contain string

Nov 03, 2022

python apache-spark pyspark apache-spark-sql

Scala dependency on Spark installation

Oct 26, 2022

scala apache-spark

how to limit the number of concurrent map tasks per executor?

Oct 29, 2022

mapreduce apache-spark

Compare data in two RDD in spark

Feb 21, 2022

apache-spark scala-2.10 cloudera-cdh rdd

Scala error: '=' expected but ';' found

Jan 18, 2021

scala apache-spark

Cluster hangs in 'ssh-ready' state using Spark 1.2.0 EC2 launch script

Jul 08, 2022

amazon-web-services amazon-ec2 apache-spark apache-spark-1.2

How to construct ClassTag for Spark SQL DataFrame Mapping?

Jul 20, 2022

sql scala apache-spark rdd

New posts in apache-spark