How to refer deltalake tables in jupyter notebook using pyspark

Question

I'm trying to start use DeltaLakes using Pyspark.

To be able to use deltalake, I invoke pyspark on Anaconda shell-prompt as —

pyspark — packages io.delta:delta-core_2.11:0.3.0

Here is the reference from deltalake — https://docs.delta.io/latest/quick-start.html

All commands for delta lake works fine from Anaconda shell-prompt.

On jupyter notebook, reference to a deltalake table gives error.Here is the code I am running on Jupyter Notebook -

df_advisorMetrics.write.mode("overwrite").format("delta").save("/DeltaLake/METRICS_F_DELTA")
spark.sql("create table METRICS_F_DELTA using delta location '/DeltaLake/METRICS_F_DELTA'")

Below is the code I am using at start of notebook to connect to pyspark -

import findspark
findspark.init()
findspark.find()

import pyspark
findspark.find()

Below is the error I get:

Py4JJavaError: An error occurred while calling o116.save. : java.lang.ClassNotFoundException: Failed to find data source: delta. Please find packages at http://spark.apache.org/third-party-projects.html

Any suggestions?

Prasanna · Accepted Answer

I have created a Google Colab/Jupyter Notebook example that shows how to run Delta Lake.

https://github.com/prasannakumar2012/spark_experiments/blob/master/examples/Delta_Lake.ipynb

It has all the steps needed to run. This uses the latest spark and delta version. Please change the versions accordingly.

How to refer deltalake tables in jupyter notebook using pyspark

Tags:

jupyter-notebook

pyspark

delta-lake

Mauryas

1 Answers

Prasanna

Recent Activity

Donate For Us

How to refer deltalake tables in jupyter notebook using pyspark

Tags:

jupyter-notebook

pyspark

delta-lake

Mauryas

1 Answers

Prasanna

Related questions

Recent Activity

Donate For Us