The Apache Drill features list mentions that it can query data from Google Cloud Storage, but I can't find any information on how to do that. I've got it working fine with S3, but suspect i'm missing something very simple in terms of Google Cloud Storage.
Does anyone have an example Storage Plugin configuration for Google Cloud Storage?
Thanks
M
This is quite an old question, so I imagine you either found a solution or moved on with your life, but for anyone looking for a solution without using Dataproc, here's a solution:
<property>
<name>fs.gs.project.id</name>
<value>YOUR_PROJECT_ID</value>
<description>
Optional. Google Cloud Project ID with access to GCS buckets.
Required only for list buckets and create bucket operations.
</description>
</property>
<property>
<name>fs.gs.auth.service.account.private.key.id</name>
<value>YOUR_PRIVATE_KEY_ID</value>
</property>
<property>
<name>fs.gs.auth.service.account.private.key</name>
<value>-----BEGIN PRIVATE KEY-----\nYOUR_PRIVATE_KEY\n-----END PRIVATE KEY-----\n</value>
</property>
<property>
<name>fs.gs.auth.service.account.email</name>
<value>YOUR_SERVICE_ACCOUNT_EMAIL/value>
<description>
The email address is associated with the service account used for GCS
access when fs.gs.auth.service.account.enable is true. Required
when authentication key specified in the Configuration file (Method 1)
or a PKCS12 certificate (Method 3) is being used.
</description>
</property>
<property>
<name>fs.gs.working.dir</name>
<value>/</value>
<description>
The directory relative gs: uris resolve in inside of the default bucket.
</description>
</property>
<property>
<name>fs.gs.implicit.dir.repair.enable</name>
<value>true</value>
<description>
Whether or not to create objects for the parent directories of objects
with / in their path e.g. creating gs://bucket/foo/ upon deleting or
renaming gs://bucket/foo/bar.
</description>
</property>
<property>
<name>fs.gs.glob.flatlist.enable</name>
<value>true</value>
<description>
Whether or not to prepopulate potential glob matches in a single list
request to minimize calls to GCS in nested glob cases.
</description>
</property>
<property>
<name>fs.gs.copy.with.rewrite.enable</name>
<value>true</value>
<description>
Whether or not to perform copy operation using Rewrite requests. Allows
to copy files between different locations and storage classes.
</description>
</property>
Start Apache Drill.
Add a custom storage to Drill.
You're good to go.
The solution is from here, where I detail some more about what we do around data exploration with Apache Drill.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With