I'm working on creating a mongodb docker image that contains the backup production data of my org's primary database. However, when I try to push this image up I am greeted with this error.
[root@ip-1-2-3-4 inf-tool-docker-mongo]# docker push 1234567.dkr.ecr.us-east-1.amazonaws.com/inf-data-mongo:2.6-latest
The push refers to repository [1234567.dkr.ecr.us-east-1.amazonaws.com/inf-data-mongo]
e429ba9ffbf8: Pushing [==================================================>] 87.35GB/87.35GB
fbd84d2027f9: Pushing [==================================================>] 87.35GB/87.35GB
4f8f8804b65d: Pushed
140b510fa705: Pushed
a2f3704a5dbf: Pushed
c362c0ad1002: Pushed
16817a92834f: Pushed
5a637bac3303: Pushed
32f938d6fb4c: Pushed
70d8dfa3043e: Pushed
denied: Adding this part to the layer with upload id '1234567890-12345-12345-123456-12345' in the repository with name 'inf-data-mongo' in registry with id '1234567890' exceeds the maximum allowed size of a layer which is '10737418240'
My image is about 85gb-100gb in size as there's a lot of data in it. The Docker image runs fine but when I go to push it up to AWS ECR I get this error.
I've seen the Service Limits page here:
https://docs.aws.amazon.com/AmazonECR/latest/userguide/service_limits.html
However, it's worded a bit confusingly. Is there really nothing that I can do here? Surely I'm not the only one who wants to ship a large Docker image for convenience? What's my best path to move forward?
Thanks!
You should probably store your database content somewhere like S3 and ship it separately from the database Docker image.
Usually a Docker image only contains a program that's intended to be run, and if there's persistent state associated with it (like a database's data) that's stored separately. You'll run your image with something like
docker run --name mongo -v $PWD/mongo:/data mongo
Generally if you've done this, you can docker stop
the container, docker rm
it, then docker run
a new container against the same data store. If that will work, then it will also work to transplant the data to somewhere else.
So I'd suggest a workflow where you use an unmodified database image and separately distribute its data. You'd probably want to have a bootstrap script that looked something like
#!/bin/sh
SNAPSHOT=mongo-snapshot-20180831
if [ ! -d $SNAPSHOT ]; then
aws s3 cp s3://my-volume/mongo/$SNAPSHOT.tar.gz $SNAPSHOT.tar.gz
tar xzf $SNAPSHOT.tar.gz
fi
docker run --name mongo -d -p 27017:27017 -v $PWD/$SNAPSHOT:/data mongo:4.1
When I've tried to work with very large images in the past, docker build
and docker push
on images even as small as 2-4 GB ran into the sorts of troubles you're describing here (network failures, timeouts, and the like, even just copying the build context into the Docker daemon) and I'd say Docker really just doesn't work with any image sized in the gigabytes.
The solution that ended up working for my team was to have an /entrypoint.sh
script for the Docker container that gets run as ENTRYPOINT
in Dockerfile
. The script checks to see if the image is first time being run in a container -- if so it will pull down the ~90gb of databases files to the container locally. If it already ran before and has the files it skips that process.
This is perfect as it keeps our AWS ECR repo thin but if a developer needs the latest copy of production data we have a means to deploy an image that will set itself up with the necessary data with minimal inputs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With