Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill elasticsearch database before starting webapp with docker-compose?

I am trying to make a Dockerfile and docker-compose.yml for a webapp that uses elasticsearch. I have connected elasticsearch to the webapp and exposed it to host. However, before the webapp runs I need to create elasticsearch indices and fill them. I have 2 scripts to do this, data_scripts/createElasticIndex.js and data_scripts/parseGenesToElastic.js. I tried adding these to the Dockerfile with

CMD [ "node", "data_scripts/createElasticIndex.js"]
CMD [ "node", "data_scripts/parseGenesToElastic.js"]
CMD ["npm", "start"]

but after I run docker-compose up there are no indexes made. How can I fill elasticsearch before running the webapp?

Dockerfile:

FROM node:11.9.0

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY package*.json ./

# Install any needed packages specified in requirements.txt
RUN npm install
# If you are building your code for production
# RUN npm ci --only=production

#

RUN npm build
RUN npm i natives

# Bundle app source
COPY . .

# Make port 80 available to the world outside this container
EXPOSE 80

# Run app.py when the container launches
CMD [ "node", "data_scripts/createElasticIndex.js"]
CMD [ "node", "data_scripts/parseGenesToElastic.js"]
CMD [ "node", "servers/PredictionServer.js"]
CMD [ "node", "--max-old-space-size=8192", "servers/PWAServerInMem.js"]
CMD ["npm", "start"]

docker-compose.yml:

version: "3"
services:
  web:
    # replace username/repo:tag with your name and image details
    image: webapp
    ports:
      - "1337:1337"
      - "4000:85"
    depends_on:
      - redis
      - elasticsearch
    networks:
      - redis
      - elasticsearch
    volumes:
        - "/data:/data"
    environment:
      - "discovery.zen.ping.unicast.hosts=elasticsearch"
      - ELASTICSEARCH_URL=http://elasticsearch:9200"
      - ELASTICSEARCH_HOST=elasticsearch
  redis:
    image: redis
    networks:
      - redis
    ports:
      - "6379:6379"
    expose:
      - "6379"
  elasticsearch:
    image: elasticsearch:2.4
    ports:
      - 9200:9200
      - 9300:9300
    expose:
      - "9200"
      - "9300"
    networks:
      - elasticsearch

networks:
  redis:
    driver: bridge
  elasticsearch:
    driver: bridge
like image 762
Niek de Klein Avatar asked Jan 19 '26 12:01

Niek de Klein


2 Answers

A Docker container only ever runs one command. When your Dockerfile has multiple CMD lines, only the last one has any effect, and the rest are ignored. (ENTRYPOINT here is just a different way to provide the single command; if you specify both ENTRYPOINT and CMD then the entrypoint becomes the main process and the command is passed as arguments to it.)

Given the example you show, I'd run this in three steps:

  1. Start only the database

    docker-compose up -d elasticsearch
    
  2. Run the "seed" jobs. For simplicity I'd probably run them locally

    ELASTICSEARCH_URL=http://localhost:9200 node data_scripts/createElasticIndex.js
    

    (using your physical host's name from the point of view of a script running directly on the physical host, and the published port from the container) but if you prefer you can also run them via the Docker setup

    docker-compose run web data_scripts/createElasticIndex.js
    
  3. Once the database is set up, start your whole application

    docker-compose up -d
    

    This will leave the running Elasticsearch unaffected, and start the other containers.

An alternate pattern, if you're confident you want to run these "seed" or migration jobs on every single container start, is to write an entrypoint script. The basic pattern here is to start your server via CMD as you have it now, but to write a script that does first-time setup, ending in exec "$@" to run the command, and make that your container's ENTRYPOINT. This could look like

#!/bin/sh
# I am entrypoint.sh

# Stop immediately if any of these scripts fail
set -e

# Run the migration/seed jobs
node data_scripts/createElasticIndex.js
node data_scripts/parseGenesToElastic.js

# Run the CMD / `docker run ...` command
exec "$@"
# I am Dockerfile
FROM node:11.9.0
...
COPY entrypoint.sh ./       # if not already copied
RUN chmod +x entrypoint.sh  # if not already executable
ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["npm", "start"]

Since the entrypoint script really is just a shell script, you can use arbitrary logic for this, for instance only running the seed job based on the command, if [ "$1" == npm ]; then ... fi but not for debugging shells (docker run --rm -it myimage bash).

Your Dockerfile also looks like you might be trying to start three different servers (PredictionServer.js, PWAServerInMem.js, and whatever npm start starts); you can run these in three separate containers from the same image and specify the command: in each docker-compose.yml block.

Your docker-compose.yml will be simpler if you remove the networks: (unless it's vital to you that your Elasticsearch and Redis can't talk to each other; it usually isn't) and the expose: declarations (which do nothing, especially in the presence of ports:).

like image 79
David Maze Avatar answered Jan 21 '26 04:01

David Maze


I faced the same issue, and I started my journey using the same approach posted here.

I was redesigning some queries that required me frequently index settings and properties mapping changes, plus changes in the dataset that I was using as an example.

I searched for a docker image that I could easily add to my docker-compose file to allow me to change anything in either the index settings or in the dataset example. Then, I could simply run docker-compose up, and I'd see the changes in my local kibana.

I found nothing, and I ended up creating one on my own. So I'm sharing here because it could be an answer, plus I really hope to help someone else with the same issue.

You can use it as follow:

 elasticsearch-seed:
    container_name: elasticsearch-seed
    image: richardsilveira/elasticsearch-seed
    environment:
      - ELASTICSEARCH_URL=http://elasticsearch:9200
      - INDEX_NAME=my-index
    volumes:
      - ./my-custom-index-settings.json:/seed/index-settings.json
      - ./my-custom-index-bulk-payload.json:/seed/index-bulk-payload.json

You can simply point your index settings file - which should have both index settings + type mappings as usual and point your bulk payload file that should contain your example data.

More instruction at elasticsearch-seed github repository

We could even use it in our E2E and Integrations tests scenarios running in our CI pipelines.

like image 20
Richard Lee Avatar answered Jan 21 '26 04:01

Richard Lee



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!