Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Machine Learning Tools Docker Image Size Issue

I need a docker container with the following packages installed on it for some sort of computational analysis. The packages listed below are inside the requirements.txt file.

boto3 = "*"
nltk ="*"
pandas = "*"
scikit-learn = "*"
sentence_transformers = "*"
spacy = {extras = ["lookups"],version = "*"}
streamlit = "*"
tensorflow = "*"
unidecode = "*"

I have write a Dockerfile for this thing, The issue here I am facing is the size of the Docker Image which is around 6 GB (6.42 exactly). Can anybody help me with this issue, How I can reduce the size of the Docker Image.

Here is the DockerFile

FROM python:3.7-slim-buster as base

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

COPY . /opt/program

WORKDIR /opt/program/

RUN chmod +x train

# Install dependencies
RUN apt-get update \
    && apt-get upgrade -y \
    && apt-get autoremove -y \
    && apt-get install -y \
    gcc \
    build-essential \
    zlib1g-dev \
    wget \
    unzip \
    cmake \
    python3-dev \
    gfortran \
    libblas-dev \
    liblapack-dev \
    libatlas-base-dev \
    && apt-get clean

# Install Python packages
RUN pip install --upgrade pip \
    && pip install \
    ipython[all] \
    nose \
    matplotlib \
    pandas \
    scipy \
    sympy \
    && rm -fr /root/.cache

RUN pip install --install-option="--prefix=/install" -r requirements.txt

like image 714
Saad Avatar asked May 22 '26 21:05

Saad


1 Answers

Get some method from other's Dockerfile,or documents:

  • delete apt cache

do rm -rf /var/lib/apt/lists/* after you run apt-install ,such as

RUN apt-get update && apt-get install -y \
        ca-certificates \
        netbase \
    && rm -rf /var/lib/apt/lists/*

Not:

RUN apt-get update && apt-get install -y \
      ca-certificates \
      netbase
RUN rm -rf /var/lib/apt/lists/*
  • no-install-recommends
RUN apt-get update && apt-get install -y --no-install-recommends \
        ca-certificates \
        netbase \
    && rm -rf /var/lib/apt/lists/*

no-install-recommends means : do not install non-essential dependency packages.

  • remove middle software

egg:

RUN apt-get update && apt-get install -y --no-install-recommends \
        gcc \
        g++ \
    && pip install cython && apt-get  remove -y gcc g++ \ 
    && rm -rf /var/lib/apt/lists/*

Some software ,like gcc,only use when install some software,we can remove it after install finish.

  • pip use no cache

egg:


RUN pip install --no-cache-dir -r requirements.txt

  • download and remove better than copy?

I am not sure it.From other's Dockerfile, they download file and finally delete it after use in one RUN,not copy file in it.

  • Not docker a model data into a image.

If you use tensorflow or other AI application,you may have some model data(size is a few G),better way is download it when run in container or by ftp,object storage,or others way —— not in image,just mount or download.

  • take care about the .git folder

Just in my experience. If you use git to contorl codes. The .git folder may very very big. The command COPY . /XXX will copy .git to image.Find a way to filter the .git.For my use:


FROM  apline:3.12 as MID
COPY XXX /XXX/
COPY ... /XXX/

FROM image:youneed
COPY --from=MID /XXX/ /XXX/ 
RUN apt-get update && xxxxx

CMD ["python","app.py"]

or use .dockerignore.

get above from :

  • Python:3.6-slim

In your Dockerfile

# Did wget,cmake and some on  is necessary?

COPY . /opt/program

WORKDIR /opt/program/

# Install dependencies
RUN chmod +x train && apt-get update \
    && apt-get upgrade -y \
    && apt-get autoremove -y \
    && apt-get install -y \
    gcc \
    build-essential \
    zlib1g-dev \
    wget \
    unzip \
    cmake \
    python3-dev \
    gfortran \
    libblas-dev \
    liblapack-dev \
    libatlas-base-dev \
    && apt-get clean && pip install --upgrade pip \
    && pip install --no-cache-dir \
    ipython[all] \
    nose \
    matplotlib \
    pandas \
    scipy \
    sympy \
    && pip install --no-cache-dir --install-option="--prefix=/install" -r requirements.txt
    && apt-get remove -y gcc unzip cmake \ # just have a try,to find what software we can remove.
    && rm -rf /var/lib/apt/lists/*
    && rm -fr /root/.cache

Of course, by this way, you may get a just smaller size image,but docker build process, will not use docker's cache .So during you try to find what software can delete, split to two or three commands RUN to use more docker cache.

Hope to help you.

like image 150
lanhao945 Avatar answered May 25 '26 10:05

lanhao945



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!