Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I add headers for the output csv for apache beam dataflow?

I noticed in the java SDK, there is a function that allows you to write the headers of a csv file. https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write.html#withHeader-java.lang.String-

Is this feature mirrored on the python SDK?

like image 496
agsolid Avatar asked Dec 29 '25 00:12

agsolid


1 Answers

You can now write to a text and specify a header using the text sink.

From the documentation:

class apache_beam.io.textio.WriteToText(
    file_path_prefix, file_name_suffix='',
    append_trailing_newlines=True, num_shards=0, 
    shard_name_template=None, coder=ToStringCoder, 
    compression_type='auto', header=None)

So you can use the following piece of code:

beam.io.WriteToText(bucket_name, file_name_suffix='.csv', 
    header='colname1, colname2')

The complete documentation is available here if you want details or check how it is implemented: https://beam.apache.org/documentation/sdks/pydoc/2.0.0/_modules/apache_beam/io/textio.html#WriteToText

like image 75
nlassaux Avatar answered Dec 30 '25 16:12

nlassaux



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!