Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't upload large Docker image to AWS ECR

I'm working on creating a mongodb docker image that contains the backup production data of my org's primary database. However, when I try to push this image up I am greeted with this error.

[root@ip-1-2-3-4 inf-tool-docker-mongo]# docker push 1234567.dkr.ecr.us-east-1.amazonaws.com/inf-data-mongo:2.6-latest
The push refers to repository [1234567.dkr.ecr.us-east-1.amazonaws.com/inf-data-mongo]
e429ba9ffbf8: Pushing [==================================================>]  87.35GB/87.35GB
fbd84d2027f9: Pushing [==================================================>]  87.35GB/87.35GB
4f8f8804b65d: Pushed
140b510fa705: Pushed
a2f3704a5dbf: Pushed
c362c0ad1002: Pushed
16817a92834f: Pushed
5a637bac3303: Pushed
32f938d6fb4c: Pushed
70d8dfa3043e: Pushed
denied: Adding this part to the layer with upload id '1234567890-12345-12345-123456-12345' in the repository with name 'inf-data-mongo' in registry with id '1234567890' exceeds the maximum allowed size of a layer which is '10737418240'

My image is about 85gb-100gb in size as there's a lot of data in it. The Docker image runs fine but when I go to push it up to AWS ECR I get this error.

I've seen the Service Limits page here:

https://docs.aws.amazon.com/AmazonECR/latest/userguide/service_limits.html

However, it's worded a bit confusingly. Is there really nothing that I can do here? Surely I'm not the only one who wants to ship a large Docker image for convenience? What's my best path to move forward?

Thanks!

like image 593
Dan Avatar asked Oct 15 '25 08:10

Dan


2 Answers

You should probably store your database content somewhere like S3 and ship it separately from the database Docker image.

Usually a Docker image only contains a program that's intended to be run, and if there's persistent state associated with it (like a database's data) that's stored separately. You'll run your image with something like

docker run --name mongo -v $PWD/mongo:/data mongo

Generally if you've done this, you can docker stop the container, docker rm it, then docker run a new container against the same data store. If that will work, then it will also work to transplant the data to somewhere else.

So I'd suggest a workflow where you use an unmodified database image and separately distribute its data. You'd probably want to have a bootstrap script that looked something like

#!/bin/sh
SNAPSHOT=mongo-snapshot-20180831
if [ ! -d $SNAPSHOT ]; then
  aws s3 cp s3://my-volume/mongo/$SNAPSHOT.tar.gz $SNAPSHOT.tar.gz
  tar xzf $SNAPSHOT.tar.gz
fi
docker run --name mongo -d -p 27017:27017 -v $PWD/$SNAPSHOT:/data mongo:4.1

When I've tried to work with very large images in the past, docker build and docker push on images even as small as 2-4 GB ran into the sorts of troubles you're describing here (network failures, timeouts, and the like, even just copying the build context into the Docker daemon) and I'd say Docker really just doesn't work with any image sized in the gigabytes.

like image 101
David Maze Avatar answered Oct 17 '25 01:10

David Maze


The solution that ended up working for my team was to have an /entrypoint.sh script for the Docker container that gets run as ENTRYPOINT in Dockerfile. The script checks to see if the image is first time being run in a container -- if so it will pull down the ~90gb of databases files to the container locally. If it already ran before and has the files it skips that process.

This is perfect as it keeps our AWS ECR repo thin but if a developer needs the latest copy of production data we have a means to deploy an image that will set itself up with the necessary data with minimal inputs.

like image 36
Dan Avatar answered Oct 17 '25 02:10

Dan