Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Troubleshoot AWS Fargate healthcheck for spring actuator

I have spring boot application with /health endpoint accessible deployed in AWS ECS Fargate. Sometimes the container is stopped with Task failed container health checks message. Sometimes happens once daily, sometimes once a week, maybe depends on the load. This is the healthcheck command specified in the Task Definition:

CMD-SHELL,curl -f http://localhost/actuator/health || exit 1

My question is how to troubleshoot what AWS receive when health-check is failed.

like image 878
smftr Avatar asked Oct 23 '25 14:10

smftr


2 Answers

In case anyone else lands here because of failing container health checks (not the same as ELB health checks), AWS provides some basic advice:

  • Check that the command works from inside the container. In my case I had not installed curl in the container image, but when I tested it from outside the container it worked fine, which fooled me into thinking it was working.
  • Check the task logs in CloudWatch

If the checks are only failing sometimes (especially under load), you can try increasing the timeout, but also check the task metrics (memory and CPU usage). Garbage collection can cause the task to pause, and if all the vCPUs are busy handling other requests, the health check may be delayed, so you may need to allocate more memory and/or vCPUs to the task.

like image 79
John Velonis Avatar answered Oct 27 '25 02:10

John Velonis


Thank @John Velonis, I don't have enough reputation for commenting on your answer, so I post that in a different answer

For my case, the ecs container keeps getting UNKNOWN status from the ecs cluster. But I can access the healthcheck successfully. when I read this post, and check my base image which is node:14.19.1-alpine3.14, it doesn't have curl command.

so I have to install that in the Dockerfile

RUN apk --no-cache add curl
like image 38
Minh Chau Avatar answered Oct 27 '25 00:10

Minh Chau