I've read previous posts on this, but I still cannot pinpoint why I am unable to connect my ipython notebook to a Postgres db.
I am able to launch pyspark in an ipython notebook, SparkContext is loaded as 'sc'.
I have the following in my .bash_profile for finding the Postgres driver:
export SPARK_CLASSPATH=/path/to/downloaded/jar
Here's what I am doing in the ipython notebook to connect to the db (based on this post):
from pyspark.sql import DataFrameReader as dfr
sqlContext = SQLContext(sc)
table= 'some query'
url = 'postgresql://localhost:5432/dbname'
properties = {'user': 'username', 'password': 'password'}
df = dfr(sqlContext).jdbc(
url='jdbc:%s' % url, table=table, properties=properties
)
The error:
Py4JJavaError: An error occurred while calling o156.jdbc.
: java.SQL.SQLException: No suitable driver.
I understand it's an error with finding the driver I've downloaded, but I don't understand why I am getting this error when I've added the path to it in my .bash_profile.
I also tried to set driver via pyspark --jars, but I get a "no such file or directory" error.
This blogpost also shows how to connect to Postgres data sources, but the following also gives me a "no such directory" error:
./bin/spark-shell --packages org.postgresql:postgresql:42.1.4
Additional info:
spark version: 2.2.0
python version: 3.6
java: 1.8.0_25
postgres driver: 42.1.4
I am not sure why the above answer did not work for me but I thought I could also share what actually worked for me when running pyspark from a jupyter notebook (Spark 2.3.1 - Python 3.6.3):
from pyspark.sql import SparkSession
spark = SparkSession.builder.config('spark.driver.extraClassPath', '/path/to/postgresql.jar').getOrCreate()
url = 'jdbc:postgresql://host/dbname'
properties = {'user': 'username', 'password': 'pwd'}
df = spark.read.jdbc(url=url, table='tablename', properties=properties)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With