Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Unable to submit Spring boot java application to Spark cluster

Write and run pyspark in IntelliJ IDEA

Spark Scala filter DataFrame where value not in another DataFrame

scala apache-spark

TypeError: 'JavaPackage' object is not callable

Spark Dataset and java.sql.Date

Spark pulling data into RDD or dataframe or dataset

Pyspark simple re-partition and toPandas() fails to finish on just 600,000+ rows

Spark error: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

scala apache-spark

Spark is inventing his own AWS secretKey

Yarn slave nodes are not communicating with master node?

Project_Bank.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [110, 111, 13, 10]

Is there any way to get the output of Spark's Dataset.show() method as a string?

How to pivot streaming dataset?

UDF cause warning: CachedKafkaConsumer is not running in UninterruptibleThread (KAFKA-1894)

How can I force spark/hadoop to ignore the .gz extension on a file and read it as uncompressed plain text?

scala hadoop apache-spark gzip

pyspark equivalence of `df.loc`?

Calling a rest service from Spark

scala apache-spark rest

Does Spark support BigInteger type?

Failed to execute user defined function($anonfun$9: (string) => double) on using String Indexer for multiple columns

Spark: Prevent shuffle/exchange when joining two identically partitioned dataframes