Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Error while using Hive context in spark : object hive is not a member of package org.apache.spark.sql

Scala/Spark version compatibility

scala apache-spark

Selecting only numeric/string columns names from a Spark DF in pyspark

How to allocate more executors per worker in Standalone cluster mode?

apache-spark

PySpark - Adding a Column from a list of values using a UDF

spark partition data writing by timestamp

Invalid Spark URL in local spark session

apache-spark

UnsatisfiedLinkError: no snappyjava in java.library.path when running Spark MLLib Unit test within Intellij

How can I efficiently read multiple json files into a Dataframe or JavaRDD?

java json apache-spark

spark error RDD type not found when creating RDD

What is the best way to define custom methods on a DataFrame?

java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession

java apache-spark

Apply same function to all fields of spark dataframe row

Pyspark: Replacing value in a column by searching a dictionary

pyspark and HDFS commands

Making histogram with Spark DataFrame column

Keep only duplicates from a DataFrame regarding some field

how to cast all columns of dataframe to string

Spark streaming multiple sources, reload dataframe

Mixed Effects Models in Spark or other technology