Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark

Spark Is there any rule of thumb about the optimal number of partition of a RDD and its number of elements?

Spark sql top n per group

org.apache.thrift.transport.TTransportException error while Reading large JSON file in zeppelin scala

How to split column of vectors into two columns?

Running steps of EMR in parallel

How Spark handle data larger than cluster memory

apache-spark

Dropping nested column of Dataframe with PySpark

Best practice to create SparkSession object in Scala to use both in unittest and spark-submit

Add months to date column in Spark dataframe

What does "pre-built for Apache Hadoop 2.7 and later" mean?

apache-spark

How can I obtain the DAG of an Apache Spark job without running it?

scala apache-spark

Why is no map function for dataframe in pyspark while the spark equivalent has it?

apache-spark pyspark

How to set spark.driver.memory for Spark/Zeppelin on EMR

Is there a way to validate the syntax of raw spark sql query?

scala apache-spark

java.lang.UnsupportedOperationExceptionfieldIndex on a Row without schema is undefined: Exception on row.getAs[String]

scala apache-spark

How to select multiple columns of dataset, given a list of column names?

Spark decimal type precision loss

Comparison of a `float` to `np.nan` in Spark Dataframe

How do I get a spark dataframe to print it's explain plan to a string

How to find the max String length of a column in Spark using dataframe?