I'm stumped by a question I received when working on an opensource Dockerfile, which boils down to, "why did you change the layers?" - so I'm trying to answer that with my own investigation.
I apologize that the subject is not well defined, but essentially it's about how docker layers relate to the docker-cache.
So I'm looking for an elegant explanation in an area which isn't well documented.
My changes from the original Dockerfile where to separate ENV into different layers, move a COPY earlier, and to expose the port later.
The original (simplified):
FROM ubuntu:latest
EXPOSE 80
ENV HELLO world \
&& DOCKER whale
RUN # Run stuff
COPY source /to/container
CMD # Do stuff
My changes:
FROM ubuntu:latest
ENV HELLO world
ENV DOCKER whale # <-- Separate ENV into different layers
COPY source /to/container # <-- Less prone to change, move earlier
RUN # Run stuff
EXPOSE 80 # <-- "Bake in" port later
CMD # Do stuff
It's my understanding that, from a docker-cache perspective, separating the ENV variables into different layers is a good practice because -- if a user wants to override an ENV -- only one ENV needs to change within it's own layer, instead of altering the entire layer which contains all the ENVs for the sake of one.
But adding the port EXPOSE later - it just feels right. This is because I've used Docker for about 18 months, and nearly all of Docker's docs and guides expose the port later in the Dockerfile.
I'm also led to believe this based upon my experience (and attending DockerCon2017 and participating some "best practice" classes) that layers more prone to changes/overrides should be placed later in a Dockerfile, to better optimize the docker-cache so there aren't so many low-level layer variations.
Is my belief correct (or foolish) assuming that separating ENV layers, moving COPY earlier, and placing EXPOSE layers later is good practice and is an overall improvement upon the original Dockerfile's layers from the perspective of optimizing the Docker cache?
While this question has some heavily opinionated possible answers, I'll attempt to keep to facts and other things sourced from docker's docs on this
Proper layering of layers in docker has essentially three goals (roughly ordered):
apt operations should always start with apt-get update && ... and apt-get update should never be in a separate RUN layerGiven that, here's some observations from the things you've proposed:
ENV layersGiven (2) above, you should keep ENV layers combined when possible. Users can override --env at runtime which does not affect build-time layering. Yes if one of the ENV lines were modified in source it would invalidate the rest of the file (3) but generally this is traded off for performance reasons.
COPY upgenerally this is not a good idea, the source on disk is among the most likely things to change, if the source changes, all the layers from the COPY layer downwards are invalidated
EXPOSEThis really doesn't matter. EXPOSE is a nearly-trivial layer (it in fact does nothing unless you're linking containers). Since it is cacheable, I'd put it near the top but again, it's trivial to compute and doesn't really change.
tl;dr The maintainer is correct in saying no to all three changes as it will make build and run performance worse.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With