Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark sqlContext read Postgres 9.6 NullPointerException

Trying to read a table with PySpark from a Postgres DB. I have set up the following code and verified SparkContext exists:

import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-class-path /tmp/jars/postgresql-42.0.0.jar --jars /tmp/jars/postgresql-42.0.0.jar pyspark-shell'


from pyspark import SparkContext, SparkConf

conf = SparkConf()
conf.setMaster("local[*]")
conf.setAppName('pyspark')

sc = SparkContext(conf=conf)


from pyspark.sql import SQLContext

properties = {
    "driver": "org.postgresql.Driver"
}
url = 'jdbc:postgresql://tom:@localhost/gqp'

sqlContext = SQLContext(sc)
sqlContext.read \
    .format("jdbc") \
    .option("url", url) \
    .option("driver", properties["driver"]) \
    .option("dbtable", "specimen") \
    .load()

I get the following error:

Py4JJavaError: An error occurred while calling o812.load. : java.lang.NullPointerException

The name of my database is gqp, table is specimen, and have verified it is running on localhost using the Postgres.app macOS app.

like image 372
tom Avatar asked Jun 19 '26 17:06

tom


1 Answers

The URL was the problem!

Originally it was: url = 'jdbc:postgresql://tom:@localhost/gqp'

I removed the tom:@ part, and it worked. The URL must follow the pattern: jdbc:postgresql://ip_address:port/db_name, whereas mine was directly copied from a Flask project.

If you're reading this, hope you didn't make this same mistake :)

like image 140
tom Avatar answered Jun 23 '26 02:06

tom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!