Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Relative path in absolute URI Exception while accessing DynamoDB via Glue Data Catalogue in PySpark running on EMR

I am executing a pyspark application on AWS EMR that is configured to use AWS Glue Data Catalog as metastore. I have a table setup in AWS Glue that points to DynamoDB table. And now in my pyspark script, I am trying to access the Glue table. I am able to do show tables and able to see the glue table. But when I try to query the table, I am getting below exception,

pyspark.sql.utils.AnalysisException: u'java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: arn:aws:dynamodb:<region>:<acct_id>:table/DDBTABLE;'

My query in pyspark script:

spark.sql("select * from ddbtable").show()

Couldn't find any good reference on this. I see people talking about issue with spark.sql.warehouse.dir. But not sure how it is related to glue data catalog. Any inputs ?

like image 468
ranjith Avatar asked Oct 22 '25 18:10

ranjith


1 Answers

Contacted AWS Tech and apparently this is an issue with EMR (as of 5.23.0) while using Glue data catalog and accessing Glue table that connects to DynamoDB. They are still working on this and meanwhile have provided below workaround.

Edit the properties file of the Glue table to include below,

update : Location property to some dummy S3 location so that it is of the form - s3://dummy-path

add : Add below DynamoDB specific information under parameters,

"dynamodb.table.name": "ddb-table",
"dynamodb.column.mapping": "col:col",
"storage_handler": "org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler"

For updating glue table refer here

like image 136
ranjith Avatar answered Oct 25 '25 00:10

ranjith



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!