Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark - partitionBy to S3 handle special character

I have a column called target_col_a in my dataframe with Timestamp value which have been casted to String e.g. 2020-05-27 08:00:00.

I then partitionBy this column as per below.

target_dataset \
    .write.mode('overwrite') \
    .format('parquet') \
    .partitionBy('target_col_a') \
    .save('s3://my-bucket/my-path')

However, my s3 path turns out like s3://my-bucket/my-path/target_col_a=2020-05-27 08%3A00%3A00/part-0-file1.snappy.parquet

Is there a way to output the partition without the %3A and retain :?

Note: when I use Glue native DynamicFrame to write to S3 or Redshift UNLOAD to S3 the partitioning comes as desired (without the %3A and with :) e.g.

glueContext.write_dynamic_frame.from_options(
    frame = target_dataset,
    connection_type = "s3",
    connection_options = {
        "path": "s3://my-bucket/my-path/",
        "partitionKeys": ["target_col_a"]},
    format = "parquet",
    transformation_ctx = "datasink2"
)
like image 649
nsc060 Avatar asked Nov 04 '25 12:11

nsc060


2 Answers

The short answer is no, you can't.

Pyspark uses hadoop client libraries for input and output. These libraries create paths using the Java URI package. Spaces and colons are not valid URI characters, so they're URL encoded before writing. Pyspark will handle the decoding automatically when the dataset is read, but if you want to access the datasets outside of Spark or Hadoop, you'll need to URL decode the column values.

like image 72
Dave Avatar answered Nov 06 '25 03:11

Dave


Specially characters like spaces and : cannot be part of any S3 URI. Even if some how manage to create one you would face difficulties later on every time you use them.

Better to replace these character with URI acceptable ones.

You should follow the key name convention described in this paragraph called Object Key Guidelines of Amazon S3.

The following character sets are generally safe for use in key names:

Alphanumeric characters [0-9a-zA-Z]

Special characters !, -, _, ., *, ', (, and )

like image 35
QuickSilver Avatar answered Nov 06 '25 04:11

QuickSilver



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!