apache-spark tutorials and guides

Spark/Scala Opening Zipped CSV Files

Sep 23, 2022

scala apache-spark

IOException: Cannot run program "javac" when "sudo ./sbt/sbt compile" in Spark?

Nov 08, 2022

sbt apache-spark

Import TSV File in spark

Nov 19, 2022

scala apache-spark

Spark Streaming with large number of streams and models used for analytical processing of RDDs

Nov 09, 2022

apache-spark redis spark-streaming

Apache Spark with custom InputFormat for HadoopRDD

Oct 12, 2017

hadoop apache-spark

how to divide rdd data into two in spark?

Sep 12, 2022

python apache-spark pyspark rdd

Spark- Saving JavaRDD to Cassandra

Jun 30, 2022

java apache-spark cassandra rdd spark-cassandra-connector

Spark Combinebykey JAVA lambda expression

Jul 08, 2022

java lambda apache-spark

Scala error Could not find implicit value for parameter

Mar 07, 2019

scala apache-spark scala-breeze

How to restrict processing to specified number of cores in spark standalone

Aug 11, 2017

scala apache-spark

How to calculate the mean of each pair in an RDD consisting of (Key, [Value]) pairs in Spark?

Jul 12, 2022

scala apache-spark

How to create a VertexId in Apache Spark GraphX using a Long data type?

Nov 04, 2022

scala apache-spark spark-graphx

Spark lists all leaf node even in partitioned data

Nov 12, 2022

apache-spark amazon-s3 apache-spark-sql partitioning parquet

Remove duplicates from a dataframe in PySpark

Sep 08, 2022

python apache-spark pyspark duplicates pyspark-dataframes

How to get rid of derby.log, metastore_db from Spark Shell

Aug 30, 2022

apache-spark derby

What is the difference between HashingTF and CountVectorizer in Spark?

Jun 05, 2022

apache-spark apache-spark-mllib apache-spark-ml

How to map features from the output of a VectorAssembler back to the column names in Spark ML?

Sep 07, 2022

python apache-spark machine-learning pyspark apache-spark-ml

How to add a Spark Dataframe to the bottom of another dataframe?

Aug 28, 2022

scala apache-spark dataframe

Joining two DataFrames in Spark SQL and selecting columns of only one

Aug 19, 2022

scala apache-spark apache-spark-sql

How to group by time interval in Spark SQL

Sep 22, 2022

sql apache-spark apache-spark-sql

New posts in apache-spark