Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set Hadoop fs.s3a.acl.default on AWS EMR?

I have a map-reduce application running on AWS EMR that writes some output to a different (aws account) s3 bucket. I have the permission setup and the job can write to the external bucket, but the owner is still the root from the account where the Hadoop job is running. I would like to change this to the external account that owns the bucket.

I found I can set fs.s3a.acl.default to bucket-owner-full-control, however that doesn't seem like working. This is what I am doing:

conf.set("fs.s3a.acl.default", "bucket-owner-full-control");
FileSystem fileSystem = FileSystem.get(URI.create(s3Path), conf);
FSDataOutputStream fsDataOutputStream = fileSystem.create(new Path(filePath));
PrintWriter writer  = new PrintWriter(fsDataOutputStream);
writer.write(contentAsString);
writer.close();
fsDataOutputStream.close();

Any help is appreciated.

like image 970
J.H Avatar asked Nov 05 '25 01:11

J.H


2 Answers

conf.set("fs.s3a.acl.default", "bucket-owner-full-control");

is the right property you are setting.

As this the property in core-site.xml to give full control to bucket owner.

<property>
  <name>fs.s3a.acl.default</name>
  <description>Set a canned ACL for newly created and copied objects. Value may be private,
     public-read, public-read-write, authenticated-read, log-delivery-write,
     bucket-owner-read, or bucket-owner-full-control.</description>
</property>

BucketOwnerFullControl

    Specifies that the owner of the bucket is granted Permission.FullControl. The owner of the bucket is not necessarily the same as the owner of the object.

I recommend to set fs.s3.canned.acl also to value BucketOwnerFullControl

For debugging you can use the below snippet to understand what parameters are actually passing..

for (Entry<String, String> entry: conf) {
      System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
    }

For testing purpose do this command with command line

aws s3 cp s3://bucket/source/dummyfile.txt s3://bucket/target/dummyfile.txt --sse --acl bucket-owner-full-control

If this works then through api also it will.

Bonus point with Spark , useful for spark scala users:

For Spark to access the s3 file system and set the proper configurations like the below example...

val hadoopConf = spark.sparkContext.hadoopConfiguration
    hadoopConf.set("fs.s3a.fast.upload","true")
    hadoopConf.set("mapreduce.fileoutputcommitter.algorithm.version","2")
    hadoopConf.set("fs.s3a.server-side-encryption-algorithm", "AES256")
    hadoopConf.set("fs.s3a.canned.acl","BucketOwnerFullControl")
    hadoopConf.set("fs.s3a.acl.default","BucketOwnerFullControl")
like image 121
Ram Ghadiyaram Avatar answered Nov 07 '25 14:11

Ram Ghadiyaram


If you are using EMR then you have to use the AWS team's S3 connector, with "s3://" URLs and use their documented configuration options. They don't support the apache one, so any option with "fs.s3a" at the beginning isn't going to have any effect whatsoever.

like image 25
stevel Avatar answered Nov 07 '25 14:11

stevel