How to relationalize a JSON to flat structure in AWS Glue

Question

Trying to flatten input JSON data having two map/dictionary fields (custom_event1 and custom_event2), which may contain any key-value pair data. In order to create an output table from the data frame, will have to avoid the flattening of custom_events and store it as JSON string in the column.

Following this doc, Relationalize.apply is flattening the custom_events map also.

Sample JSON:

{
    "id": "sklfsdfskdlfsdfsdfkhsdfssdf",
    "idtype": "cookieId",
    "event": "install",
    "sub_event": null,
    "ip": "XXXXXX",
    "geo": {
        "country": "IN",
        "city": null,
        "region": null
    },
    "carrier": {
        "operator": null,
        "network": null,
        "connection_type": null
    },
    "user_agent": "Mozilla/5.0",
    "device": {
        "brand": "LYF",
        "model": null,
        "type": null
    },
    "package": {
        "pkgName": "XXXXXXXX",
        "pkgVersion": "1.5.6.3",
        "pkgRating": null,
        "timestamp": "2017-12-14 11:51:27"
    },
    "custom_event1": {
        "key1": "value1",
        "key2": "value2"
    },
    "custom_event2": {
        "key": "value"
    }
}

How to store JSON data with a dynamic map field in a Relational storage?

Haroon · Accepted Answer

The steps that you would need, assumption that JSON data is in S3

Create a Crawler in AWS Glue and let it create a schema in a catalog (database). Assumption is that you are familiar with AWS Glue a little.
Create a Glue job that transforms the JSON into your favorite format (parquet) that uses the transform step to flatten the data using Rationalize class - https://aws.amazon.com/blogs/big-data/simplify-querying-nested-json-with-the-aws-glue-relationalize-transform/ and writes to parquet format
Create a crawler for the new flatten data and create the table in aws glue
Use Athena or AWS Quick sight or your favorite BI tool to query parquet data

How to relationalize a JSON to flat structure in AWS Glue

Tags:

python

amazon-web-services

apache-spark-sql

pyspark

aws-glue

Sumit Saurabh

1 Answers

Haroon

Recent Activity

Donate For Us

How to relationalize a JSON to flat structure in AWS Glue

Tags:

python

amazon-web-services

apache-spark-sql

pyspark

aws-glue

Sumit Saurabh

1 Answers

Haroon

Related questions

Recent Activity

Donate For Us