I have a data stream containing Strings which look like JSONArrays. I want to parse those Strings and write to BigQuery table using Apache Beam but am getting an error while writing repeated Strings.
Here´s how I convert my string to TableRow:
String dataString = "[{\"EMAIL\": [\"[email protected]\"]}]";
JSONArray jsonArray = new JSONArray(dataString);
TableRow tableRow = new TableRow();
for (int i = 0; i < jsonArray.length(); i++) {
JSONArray emailArray = new JSONArray(jsonArray.getJSONObject(i).get("EMAIL").toString());
tableRow.set("EMAIL", emailArray); //Results in error
}
Here´s what my BigQuery schema looks like:
[
{
"name": "EMAIL",
"type": "STRING",
"mode": "REPEATED"
}
]
I have managed to write a similar repeated String to BigQuery table using Python but unable to do it using Apache Beam. I suppose I am not saving the right key-value pair in TableRow. The error I am getting now is:
java.io.IOException: Insert failed: [{"errors":[{"debugInfo":"","location":"email","message":"This field is not a record.","reason":"invalid"}],"index":0}]
I need help regarding how to save a similar repeated String to BigQuery without creating a record and would appreciate any advice or suggestions. Thanks in advance.
It seems you want to create
Note that is seems your ValidFrom field is of type STRING, not a repeated field, unless it is wrapped in a repeated field in a hierarchical schema.
In the example code you provided, you are creating a JSONArray and putting it into the STRING field, which I think cause issues as the types are incompatible. If you want to keep it as a plain STRING field, you can use Solution 1 below.
Also make sure that the name of your column in BigQuery matches the one in your code, I see you use both ValidFrom and EMAIL (might be a mistake in your posted code though).
In case you want to add one row with a concatenated String field in BigQuery, you can use the following:
// Initialize your final row
TableRow tableRow = new TableRow();
// Find email addresses
String [] emails = ... // your extraction logic
// Build a concatenated string of emails
String allEmails = String.join(";", emails);
// Add the string field to the row
tableRow.set('EMAILS', allEmails);
In case you want to insert multiple rows, you your create multiple table rows:
// Find email addresses
String [] emails = ... // your extraction logic
// Build a row per email
for(String email: emails) {
// Initialize your final row
TableRow tableRow = new TableRow();
tableRow.set('EMAIL', email);
// TODO: do something with the row (add to list, or ...)
}
In case you want to add one row with a REPEATED STRING field in BigQuery, you can use the following:
// Initialize your final row
TableRow tableRow = new TableRow();
// Find email addresses
String [] emails = ... // your extraction logic
// Build the repeated field
List<String> emailCells = new ArrayList<>();
for(String email: emails) {
emailCells.add(email);
}
// Add the repeated field to the row
tableRow.set('EMAILS', emailCells);
If this is not what you're aiming for, please provide some more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With