Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Registered Targets Disappear

Tags:

amazon-eks

I have a working EKS cluster. It is using a ALB for ingress.

When I apply a service and then an ingress most of these work as expected. However some target groups eventually have no registered targets. If I get the service IP address kubectl describe svc my-service-name and manually register the EndPoints in the target group the pods are reachable again but that's not a sustainable process.

Any ideas on what might be happening? Why doesn't EKS find the target groups as pods cycle?

Each service (secrets, deployment, service and ingress consists of a set of .yaml files applied like:

deploy.sh

#!/bin/bash
set -e

kubectl apply -f ./secretsMap.yaml
kubectl apply -f ./configMap.yaml
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
kubectl apply -f ./ingress.yaml

service.yaml

apiVersion: v1
kind: Service
metadata:
  name: "site-bob"
  namespace: "next-sites"
spec:
  ports:
    - port: 80
      targetPort: 3000
      protocol: TCP
  type: NodePort
  selector:
    app: "site-bob"

ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: "site-bob"
  namespace: "next-sites"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/tags: Environment=Production,Group=api
    alb.ingress.kubernetes.io/backend-protocol: HTTP
    alb.ingress.kubernetes.io/ip-address-type: ipv4
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
    alb.ingress.kubernetes.io/load-balancer-name: eks-ingress-1
    alb.ingress.kubernetes.io/group.name: eks-ingress-1
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:402995436123:certificate/9db9dce3-055d-4655-842e-xxxxx
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
    alb.ingress.kubernetes.io/healthcheck-path: /
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
    alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '16'
    alb.ingress.kubernetes.io/success-codes: 200,201
    alb.ingress.kubernetes.io/healthy-threshold-count: '2'
    alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
    alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=60
    alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
    alb.ingress.kubernetes.io/actions.ssl-redirect: >
      {
        "type": "redirect", 
        "redirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}
      }

    
    alb.ingress.kubernetes.io/actions.svc-host: >
      {
        "type":"forward",
        "forwardConfig":{
          "targetGroups":[
            {
              "serviceName":"site-bob",
              "servicePort": 80,"weight":20}
          ],
          "targetGroupStickinessConfig":{"enabled":true,"durationSeconds":200}
        }
      }
  labels:
    app: site-bob
spec:
  rules:
    - host: "staging-bob.imgeinc.net"
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ssl-redirect
                port: 
                  name: use-annotation
          - backend:
              service:
                name: svc-host
                port:
                  name: use-annotation
            pathType: ImplementationSpecific
like image 399
bknights Avatar asked Aug 31 '25 22:08

bknights


1 Answers

Something in my configuration added tagged two security groups as being owned by the cluster. When I checked the load balancer controller logs:

kubectl logs -n kube-system aws-load-balancer-controller-677c7998bb-l7mwb

I saw many lines like:

{"level":"error","ts":1641996465.6707578,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-nextsite-sitefest-89a6f0ff0a","namespace":"next-sites","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/imageinc-next-eks-4KN4v6EX for eni eni-0c5555fb9a87e93ad, got: [sg-04b2754f1c85ac8b9 sg-07b026b037dd4d6a4]"}

sg-07b026b037dd4d6a4 has description: EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.

sg-04b2754f1c85ac8b9 has description: Security group for all nodes in the cluster.

I removed the tag:

{
    Key: 'kubernetes.io/cluster/_cluster name_', 
    value:'owned'
}

from sg-04b2754f1c85ac8b9

and the TargetGroups started to fill in and everything is now working. Both groups were created and tagged by Terraform. I suspect my worker group configuration is off.

like image 151
bknights Avatar answered Sep 05 '25 07:09

bknights



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!