Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Readiness and Liveness probes for elasticsearch 6.3.0 on Kubernetes failing

I am trying to setup EFK stack on Kubernetes . The Elasticsearch version being used is 6.3.2. Everything works fine until I place the probes configuration in the deployment YAML file. I am getting error as below. This is causing the pod to be declared unhealthy and eventually gets restarted which appears to be a false restart.

Warning Unhealthy 15s kubelet, aks-agentpool-23337112-0 Liveness probe failed: Get http://10.XXX.Y.ZZZ:9200/_cluster/health: dial tcp 10.XXX.Y.ZZZ:9200: connect: connection refused

I did try using telnet from a different container to the elasticsearch pod with IP and port and I was successful but only kubelet on the node is unable to resolve the IP of the pod causing the probes to fail.

Below is the snippet from the pod spec of the Kubernetes Statefulset YAML. Any assistance on the resolution would be really helpful. Spent quite a lot of time on this without any clue :(

PS: The stack is being setup on AKS cluster

      - name: es-data
        image: quay.io/pires/docker-elasticsearch-kubernetes:6.3.2
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: CLUSTER_NAME
          value: myesdb
        - name: NODE_MASTER
          value: "false"
        - name: NODE_INGEST
          value: "false"
        - name: HTTP_ENABLE
          value: "true"
        - name: NODE_DATA
          value: "true"
        - name: DISCOVERY_SERVICE
          value: "elasticsearch-discovery"
        - name: NETWORK_HOST
          value: "_eth0:ipv4_"          
        - name: ES_JAVA_OPTS
          value: -Xms512m -Xmx512m
        - name: PROCESSORS
          valueFrom:
            resourceFieldRef:
              resource: limits.cpu
        resources:
          requests:
            cpu: 0.25
          limits:
            cpu: 1
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        livenessProbe:
          httpGet:
            port: http
            path: /_cluster/health
          initialDelaySeconds: 40
          periodSeconds: 10
       readinessProbe:
         httpGet:
           path: /_cluster/health
           port: http
         initialDelaySeconds: 30
         timeoutSeconds: 10 

The pods/containers runs just fine without the probes in place . Expectation is that the probes should work fine when set on the deployment YAMLs and the POD should not get restarted.

like image 361
karthik ravi Avatar asked Jan 26 '26 03:01

karthik ravi


1 Answers

The thing is that ElasticSearch itself has own health statuses (red, yellow, green) and you need to consider that in your configuration.

Here what I found in my own ES configuration, based on the official ES helm chart:

        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 40
          periodSeconds: 10
          successThreshold: 3
          timeoutSeconds: 5

          exec:
            command:
              - sh
              - -c
              - |
                #!/usr/bin/env bash -e
                # If the node is starting up wait for the cluster to be green
                # Once it has started only check that the node itself is responding
                START_FILE=/tmp/.es_start_file

                http () {
                    local path="${1}"
                    if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
                      BASIC_AUTH="-u ${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
                    else
                      BASIC_AUTH=''
                    fi
                    curl -XGET -s -k --fail ${BASIC_AUTH} http://127.0.0.1:9200${path}
                }

                if [ -f "${START_FILE}" ]; then
                    echo 'Elasticsearch is already running, lets check the node is healthy'
                    http "/"
                else
                    echo 'Waiting for elasticsearch cluster to become green'
                    if http "/_cluster/health?wait_for_status=green&timeout=1s" ; then
                        touch ${START_FILE}
                        exit 0
                    else
                        echo 'Cluster is not yet green'
                        exit 1
                    fi
                fi
like image 154
Alik Khilazhev Avatar answered Jan 29 '26 13:01

Alik Khilazhev