I need a docker container with the following packages installed on it for some sort of computational analysis. The packages listed below are inside the requirements.txt file.
boto3 = "*"
nltk ="*"
pandas = "*"
scikit-learn = "*"
sentence_transformers = "*"
spacy = {extras = ["lookups"],version = "*"}
streamlit = "*"
tensorflow = "*"
unidecode = "*"
I have write a Dockerfile for this thing, The issue here I am facing is the size of the Docker Image which is around 6 GB (6.42 exactly). Can anybody help me with this issue, How I can reduce the size of the Docker Image.
Here is the DockerFile
FROM python:3.7-slim-buster as base
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"
COPY . /opt/program
WORKDIR /opt/program/
RUN chmod +x train
# Install dependencies
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get autoremove -y \
&& apt-get install -y \
gcc \
build-essential \
zlib1g-dev \
wget \
unzip \
cmake \
python3-dev \
gfortran \
libblas-dev \
liblapack-dev \
libatlas-base-dev \
&& apt-get clean
# Install Python packages
RUN pip install --upgrade pip \
&& pip install \
ipython[all] \
nose \
matplotlib \
pandas \
scipy \
sympy \
&& rm -fr /root/.cache
RUN pip install --install-option="--prefix=/install" -r requirements.txt
do rm -rf /var/lib/apt/lists/* after you run apt-install ,such as
RUN apt-get update && apt-get install -y \
ca-certificates \
netbase \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y \
ca-certificates \
netbase
RUN rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
netbase \
&& rm -rf /var/lib/apt/lists/*
no-install-recommends means : do not install non-essential dependency packages.
egg:
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
g++ \
&& pip install cython && apt-get remove -y gcc g++ \
&& rm -rf /var/lib/apt/lists/*
Some software ,like gcc,only use when install some software,we can remove it after install finish.
egg:
RUN pip install --no-cache-dir -r requirements.txt
I am not sure it.From other's Dockerfile, they download file and finally delete it after use in one RUN,not copy file in it.
If you use tensorflow or other AI application,you may have some model data(size is a few G),better way is download it when run in container or by ftp,object storage,or others way —— not in image,just mount or download.
Just in my experience. If you use git to contorl codes. The .git folder may very very big. The command COPY . /XXX will copy .git to image.Find a way to filter the .git.For my use:
FROM apline:3.12 as MID
COPY XXX /XXX/
COPY ... /XXX/
FROM image:youneed
COPY --from=MID /XXX/ /XXX/
RUN apt-get update && xxxxx
CMD ["python","app.py"]
or use .dockerignore.
# Did wget,cmake and some on is necessary?
COPY . /opt/program
WORKDIR /opt/program/
# Install dependencies
RUN chmod +x train && apt-get update \
&& apt-get upgrade -y \
&& apt-get autoremove -y \
&& apt-get install -y \
gcc \
build-essential \
zlib1g-dev \
wget \
unzip \
cmake \
python3-dev \
gfortran \
libblas-dev \
liblapack-dev \
libatlas-base-dev \
&& apt-get clean && pip install --upgrade pip \
&& pip install --no-cache-dir \
ipython[all] \
nose \
matplotlib \
pandas \
scipy \
sympy \
&& pip install --no-cache-dir --install-option="--prefix=/install" -r requirements.txt
&& apt-get remove -y gcc unzip cmake \ # just have a try,to find what software we can remove.
&& rm -rf /var/lib/apt/lists/*
&& rm -fr /root/.cache
Of course, by this way, you may get a just smaller size image,but docker build process, will not use docker's cache .So during you try to find what software can delete, split to two or three commands RUN to use more docker cache.
Hope to help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With