I have a dataframe which I want to write it as single json file with a specific name. I tried below
df2 = df1.select(df1.col1,df1.col2)
df2.write.format('json').save('/path/file_name.json') # didnt work, writing in folder 'file_name.json' and files with part-XXX
df2.toJSON().saveAsTextFile('/path/file_name.json')  # didnt work, writing in folder 'file_name.json' and files with part-XXX
Appreciate if some one can provide a solution.
In Spark, you can save (write/extract) a DataFrame to a CSV file on disk by using dataframeObj. write. csv("path") , using this you can also write DataFrame to AWS S3, Azure Blob, HDFS, or any Spark supported file systems.
The explode function explodes the dataframe into multiple rows.
Spark JSON data source API provides the multiline option to read records from multiple lines. By default, spark considers every record in a JSON file as a fully qualified record in a single line hence, we need to use the multiline option to process JSON from multiple lines.
You need to save this on single file using below code:-
df2 = df1.select(df1.col1,df1.col2)
df2.coalesce(1).write.format('json').save('/path/file_name.json')
This will make a folder with file_name.json. Check this folder you can get a single file with whole data part-000
You can do it by converting to a pandas df previously:
df.toPandas().to_json('path/file_name.json', orient='records', force_ascii=False, lines=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With