Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to refer deltalake tables in jupyter notebook using pyspark

I'm trying to start use DeltaLakes using Pyspark.

To be able to use deltalake, I invoke pyspark on Anaconda shell-prompt as —

pyspark — packages io.delta:delta-core_2.11:0.3.0

Here is the reference from deltalake — https://docs.delta.io/latest/quick-start.html

All commands for delta lake works fine from Anaconda shell-prompt.

On jupyter notebook, reference to a deltalake table gives error.Here is the code I am running on Jupyter Notebook -

df_advisorMetrics.write.mode("overwrite").format("delta").save("/DeltaLake/METRICS_F_DELTA")
spark.sql("create table METRICS_F_DELTA using delta location '/DeltaLake/METRICS_F_DELTA'")

Below is the code I am using at start of notebook to connect to pyspark -

import findspark
findspark.init()
findspark.find()

import pyspark
findspark.find()

Below is the error I get:

Py4JJavaError: An error occurred while calling o116.save. : java.lang.ClassNotFoundException: Failed to find data source: delta. Please find packages at http://spark.apache.org/third-party-projects.html

Any suggestions?

like image 505
Mauryas Avatar asked Oct 24 '25 23:10

Mauryas


1 Answers

I have created a Google Colab/Jupyter Notebook example that shows how to run Delta Lake.

https://github.com/prasannakumar2012/spark_experiments/blob/master/examples/Delta_Lake.ipynb

It has all the steps needed to run. This uses the latest spark and delta version. Please change the versions accordingly.

like image 111
Prasanna Avatar answered Oct 26 '25 20:10

Prasanna



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!