AWS Glue: Keep partitioned column as value in row after writing

Question

Does anyone know whether it's possible to tell the Glue writer to keep the column you're partitioning on in the actual dataframe?

https://aws.amazon.com/blogs/big-data/work-with-partitioned-data-in-aws-glue/

Here, $outpath is a placeholder for the base output path in S3. The partitionKeys parameter can also be specified in Python in the connection_options dict:

glue_context.write_dynamic_frame.from_options(
    frame = projectedEvents, 
    connection_options = {"path": "$outpath", "partitionKeys": ["type"]}, 
    format = "parquet")

When you execute this write, the type field is removed from the individual records and is encoded in the directory structure.

I would like to keep the type field in the individual record.

Robert Kossendey · Accepted Answer

I am not 100% sure if it possible to tell Glue to keep the column, but in the meantime you could use this workaround:

projectedEvents = projectedEvents.withColumn("type_partition",projectedEvents["type"])
        
glue_context.write_dynamic_frame.from_options(
                   frame=projectedEvents,
                   connection_options={"path": "$outpath", "partitionKeys": ["type_partition"]}, 
                   format="parquet"
             )

AWS Glue: Keep partitioned column as value in row after writing

Tags:

amazon-web-services

aws-glue

Mat

1 Answers

Robert Kossendey

Recent Activity

Donate For Us

AWS Glue: Keep partitioned column as value in row after writing

Tags:

amazon-web-services

aws-glue

Mat

1 Answers

Robert Kossendey

Related questions

Recent Activity

Donate For Us