apache-spark tutorials and guides

Read JSON file as Pyspark Dataframe using PySpark?

Dec 10, 2022

Spark throwing ArrayIndexOutOfBoundsException when parallelizing list

Dec 10, 2022

java arrays list apache-spark indexoutofboundsexception

How to integrate Palantir Foundry with Amazon S3 or HDFS

Dec 09, 2022

apache-spark amazon-s3 palantir-foundry foundry-data-connection

Pyspark merge multiple columns into a json column

Dec 10, 2022

python dataframe apache-spark pyspark

Spark cannot read files stored on AWS S3 in Frankfurt region (Ireland region works fine)

Dec 08, 2022

amazon-web-services amazon-s3 apache-spark

Reading from google storage gs:// filesystem from local spark instance

Dec 08, 2022

apache-spark google-cloud-storage google-cloud-platform

spark-shell error on Windows - can it be ignored if not using hadoop?

Dec 07, 2022

apache-spark

Apache Spark: Convert column with a JSON String to new Dataframe in Scala spark [duplicate]

Dec 07, 2022

json scala apache-spark apache-spark-sql

Read XML in spark

Dec 08, 2022

xml apache-spark dataframe pyspark apache-spark-xml

the difference between "one Executor per Core vs one Executor with multiple Core"

Dec 08, 2022

apache-spark pyspark

Apache spark job failed immediately without retry, setting maxFailures doesn't work

Dec 06, 2022

apache-spark failover self-healing

How to configure Hive to use Spark?

Dec 06, 2022

hadoop mapreduce hive apache-spark

How to execute spark-shell from file with nohup?

Dec 06, 2022

apache-spark

How to use SQL query to define table in dbtable?

Dec 05, 2022

jdbc apache-spark apache-spark-sql

How to create an empty dataFrame in Spark

Dec 06, 2022

scala apache-spark apache-spark-sql avro spark-avro

Pyspark random forest feature importance mapping after column transformations

Dec 05, 2022

apache-spark pyspark apache-spark-sql apache-spark-mllib

Describe a Dataframe on PySpark

Dec 06, 2022

python pandas apache-spark pyspark

Why does spark-ec2 fail with ERROR: Could not find any existing cluster?

Dec 04, 2022

amazon-web-services amazon-ec2 apache-spark

Using scala to dump result processed by Spark to HDFS

Dec 05, 2022

scala hadoop hdfs apache-spark

Serializing RDD

Dec 05, 2022

java apache-spark rdd

New posts in apache-spark