I want to export a value of a column in Bigquery to look like:
| NAME | JSON |
| abc | {"test": 1} |
However, when I want to export this to a gzipped csv/tsv via a python code to google cloud storage with field delimiter = '\t' (https://google-cloud.readthedocs.io/en/latest/bigquery/generated/google.cloud.bigquery.client.Client.extract_table.html) , I always get something like:
| NAME | JSON |
| abc | "{""test"": 1}" |
I know about escaping, and I have been trying a lot of possibilities with escaping (using "" to escape the " or adding -values), but I can't seem to get the export as:
{"test": 1}
Please help me?
The tool output is correct, but you'd need to read RFC 4180, the standard for CSV files, to see why.
Basically, the JSON spec says test
needs to have double quotes, i.e. "test"
.
Double quotes around the entire field are allowed in CSV. But the CSV spec also says that in a CSV with quoted fields, an inner quote is duplicated. This is rule 7 on section 2 of RFC 4180:
If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:
"aaa","b""bb","ccc"
So whats the solution?
Possibly, you need a RFC 4180 compliant CSV reader, so you aren't writing the parsing code yourself where the file is used.
You could replace the doubled double quotes with single double quotes, and the quotes at the braces with nothing like this:
sed -e 's/"{/{/g; s/}"/}/g; s/""/"/g;' in.csv > out.csv
transforming
"{""test"": 1}"
to
{ "test": 1}
or using String.replace
in JavaScript, but then the resulting csv file is NOT RFC 4180 compliant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With