Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Persisting, mounting and sharing volumes in EMR

Does AWS provide any storage solutions that satisfy the following criteria?

  1. can be mounted in a master node in EMR cluster as an OS directory under e.g. /mnt
  2. would outlive the EMR cluster if the cluster is terminated or deleted
  3. can be accessed simultaneously by multiple EC2 instances (in EMR or not)

In my mind, an NFS-like volume should satisfy all three, but I don't know if EBS, EFS and/or EMRFS can be used that way. At a minimum I am looking for something that gives me (1) and (2)


Background: EBS

In the context of the questions above, I looked into EBS, but I found conflicting information on this topic.

  • The EMR documentation says that EBS volumes are ephemeral in EMR:

    Amazon EBS works differently within Amazon EMR than it does with regular Amazon EC2 instances. Amazon EBS volumes attached to EMR clusters are ephemeral: the volumes are deleted upon cluster and instance termination (for example, when shrinking instance groups), so it’s important that you not expect data to persist

  • Meanwhile I see an option called "Delete on termination" in EBS that could be set to False, see the screenshot below.

  enter image description here

like image 561
Amelio Vazquez-Reina Avatar asked Jan 20 '26 14:01

Amelio Vazquez-Reina


1 Answers

EFS is the service you are looking for. You can mount it on EC2 nodes running in multiple Availability Zones in the same region.

The EC2 instances mount Amazon EFS file systems via the NFSv4 protocol, using standard operating system mount commands.

You can also mount the EFS on every node of EMR through a bootstrap script.

It will satisfy all three criteria for you.

like image 143
Harsh Bafna Avatar answered Jan 22 '26 05:01

Harsh Bafna