query Kerberized Hive with SQL Alchemy

Question

I'm trying to query a Kerberized Hive cluster with SQL Alchemy. I'm able to submit queries using pyhs2 which confirms that it's possible to connect and query Hive when authenticated by Kerberos:

import pyhs2
with pyhs2.connect(host='hadoop01.woolford.io',
                   port=10500,
                   authMechanism='KERBEROS') as conn:
    with conn.cursor() as cur:
        cur.execute('SELECT * FROM default.mytable')
        records = cur.fetchall()
        # etc ...

I notice that Airbnb's Airflow uses SQL Alchemy and can connect to Kerberized Hive and so I imagine it's possible to do something like this:

engine = create_engine('hive://hadoop01.woolford.io:10500/default', connect_args={'?': '?'})
connection = engine.connect()
connection.execute("SELECT * FROM default.mytable")
# etc ...

I'm not sure what parameters should be set in the connect_args dictionary. Can you see what needs to be added to make this work (e.g. Kerberos service name, realm, etc.)?

update:

Under the hood SQL Alchemy is using PyHive to connect to Hive. The current version of PyHive, v0.2.1, doesn't support Kerberos.

I notice that someone from Yahoo created a pull request that provides support for Kerberos. This PR has not yet been merged/released and so I just copied the code from the PR into /usr/lib/python2.7/site-packages/pyhive/hive.py on the Superset server created a connection like this:

engine = create_engine('hive://hadoop01:10500', connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'hive'})

Hopefully, the maintainer of PyHive will merge/release the support for Kerberos.

Barny · Accepted Answer

install these libraries

sasl
thrift
thrift-sasl
PyHive

get your kerberos ticket and then;

engine = create_engine('hive://HOST:10500/DB_NAME',
connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'hive'})

ps: /DB_NAME is optional

query Kerberized Hive with SQL Alchemy

Tags:

python

sqlalchemy

hive

kerberos

update:

Alex Woolford

1 Answers

Barny

Recent Activity

Donate For Us

query Kerberized Hive with SQL Alchemy

Tags:

python

sqlalchemy

hive

kerberos

update:

Alex Woolford

1 Answers

Barny

Related questions

Recent Activity

Donate For Us