How to write streaming data to S3?

Question

I want to write RDD[String] to Amazon S3 in Spark Streaming using Scala. These are basically JSON strings. Not sure how to do it more efficiently. I found this post, in which the library spark-s3 is used. The idea is to create SparkContext and then SQLContext. After this the author of the post does something like this:

myDstream.foreachRDD { rdd =>
      rdd.toDF().write
                .format("com.knoldus.spark.s3")
                .option("accessKey","s3_access_key")
                .option("secretKey","s3_secret_key")
                .option("bucket","bucket_name")
                .option("fileType","json")
                .save("sample.json")
}

What are another options besides spark-s3? Is it possible to append the file on S3 with the streaming data?

jzonthemtn · Accepted Answer

Files on S3 cannot be appended. An "append" means in S3 to replace the existing object with a new object that contains the additional data.

How to write streaming data to S3?

Tags:

amazon-web-services

amazon-s3

scala

apache-spark

spark-streaming

Lobsterrrr

1 Answers

jzonthemtn

Recent Activity

Donate For Us

How to write streaming data to S3?

Tags:

amazon-web-services

amazon-s3

scala

apache-spark

spark-streaming

Lobsterrrr

1 Answers

jzonthemtn

Related questions

Recent Activity

Donate For Us