Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google-BigQuery - schema parsing of CSV file

We are using Java API to load a CSV file to Google Big Query. Is there a way to detect the columns on load and auto select the appropriate schema type?

For example, if a specific column has only float, then BigQuery assigns the column as float, if non numeric then it assigns column as string. Is there a method to do this?

The roundabout way is to assign each column as string by default when loading the CSV.

Then do a query on each column -

SELECT count(columnname)- count(float(columnname)) FROM dataset.table (assuming I am only interested in isolating columns that have "float values" that I can use for math functions from my application)

Any other method to solve this problem?

like image 665
deepakd Avatar asked Dec 01 '25 03:12

deepakd


1 Answers

Right now, BigQuery does not support schema inference, so as you suggest, your options are:

  1. Provide the schema explicitly when loading data.
  2. Load all data using the string type, and cast/convert at query time.

Note that you can use the allowLargeResults feature to clean up and rewrite your imported data (but note that you'll be charged for the query, which will increase your data ingestion costs).

like image 109
Jeremy Condit Avatar answered Dec 04 '25 12:12

Jeremy Condit



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!