In AWS S3 I have json docs that I read-in with AWS Glue's create_dynamic_frame.from_options("s3" ...) and the DynamicFrame.printSchema() shows me this, which matches the schema of the documents:
root
|-- updatedAt: string
|-- json: struct
| |-- rowId: int
Then I unnest() or relationalize() (have tried both) the DynamicFrame to a new dyF and then .printSchema() shows me this, which seems correctly unnested:
root
|-- updatedAt: string
|-- json.rowId: int
The problem is that I can't seem to use the nested fields.
dyF.select_fields(["updatedAt"]) will work and give me a dyF with the "updatedAt" field.
But
dyF.select_fields(["json.rowId"]) gives me an empty dyF.
What am I doing wrong?
The solution is to use backticks around the column name.
Example: .select_fields(["journalId", "`json.rowId`"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With