Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snowflake - how to read metadata from parquet files in S3

We are using external tables in our Snowflake database, in order to read data from some AWS S3 buckets. The buckets contain various parquet files, spread over multiple partitions.

We are able to read the data from our external table by using Snowflake's stages, storage integrations and file formats.

However, we'd like to read some metadata from the parquet files as well, such as the precision of numeric data types (e.g., to find out how many decimal places we have to deal with).

To keep it simple, let's say we're reading data from one single parquet file.

Is there any way to retrieve metadata from that parquet file as to the precision of numeric data types, directly from Snowflake?

Or would you rather extract that metadata from, let's say, Glue Catalog or any other external tool?

like image 592
dovregubben Avatar asked Oct 22 '25 16:10

dovregubben


1 Answers

There's a recent public preview that infers schema that will do this:

INFER_SCHEMA(
  LOCATION => '{ internalStage | externalStage }'
  , FILE_FORMAT => '<format_name>'
)

https://docs.snowflake.com/en/sql-reference/functions/infer_schema.html

like image 151
Greg Pavlik Avatar answered Oct 24 '25 15:10

Greg Pavlik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!