Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ECS Execution Role causes log driver failure during container startup?

When using a custom IAM Role as an ECS Task Definition'scustom execution role, our resulting Service wil fail to startup on our ECS instance due to an inability to initialize the CloudWatch logging driver. Specifically, we see the following errors from ECS agent in CloudWatch:

2019-10-24T21:43:10Z [INFO] TaskHandler: Adding event: TaskChange: [arn:aws:ecs:us-west-1:REDACTED -> STOPPED, Known Sent: NONE, PullStartedAt: 2019-10-24 21:43:08.499577397 +0000 UTC m=+187.475751716, PullStoppedAt: 2019-10-24 21:43:09.69279918 +0000 UTC m=+188.668973506, ExecutionStoppedAt: 2019-10-24 21:43:10.153954812 +0000 UTC m=+189.130129126, arn:aws:ecs:us-west-1:REDACTED wordpress -> STOPPED, Reason CannotStartContainerError: Error response from daemon: failed to initialize logging driver: CredentialsEndpointError: failed to load credentials

caused by: Get http://169.254.170.2/v2/credentials/REDACTED: dial tcp 169.254.170.2:80: connect: connection refused, Known Sent: NONE] sent: false

This "connection refused error" used to be a timeout error, but I attempted to debug this issue after reading similar problems by adding iptables entries from https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html even though this is a Amazon ECS provisioned CoreOS EC2 instance (not a custom one).

Essentially that link and other issues similar to mine recommended the following, which change the error to a timeout error at least:

ubuntu:~$ sudo iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
ubuntu:~$ sudo iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679

Note that this container definition runs and works completely fine under normal conditions when we don't use a custom IAM execution role in the container definition; but since I am attempting to add an AWS SecretsManager secret in the Task Definition; this requires us to define a custom role that has access to the secret.

EDIT: Here is both the role policy JSON and the cloud-config.yml for the ECS instance:

JSON Policy Role:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:Describe*",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:DeregisterTargets",
        "elasticloadbalancing:Describe*",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "elasticloadbalancing:RegisterTargets"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    },
    {
        "Effect": "Allow",
        "Action": [
        "ssm:GetParameters",
        "secretsmanager:GetSecretValue",
        "kms:Decrypt"
        ],
        "Resource": [
            "${var.aws_mysql_secret_arn}"
        ]
    }
  ]
}

cloud-config.yml

coreos:
  units:
   - name: update-engine.service
     command: stop
   - name: amazon-ecs-agent.service
     command: start
     runtime: true
     content: |
       [Unit]
       Description=AWS ECS Agent
       Documentation=https://docs.aws.amazon.com/AmazonECS/latest/developerguide/
       Requires=docker.socket
       After=docker.socket

       [Service]
       Environment=ECS_CLUSTER=${ecs_cluster_name}
       Environment=ECS_LOGLEVEL=${ecs_log_level}
       Environment=ECS_VERSION=${ecs_agent_version}
       Restart=on-failure
       RestartSec=30
       RestartPreventExitStatus=5
       SyslogIdentifier=ecs-agent
       ExecStartPre=-/bin/mkdir -p /var/log/ecs /var/ecs-data /etc/ecs
       ExecStartPre=-/usr/bin/docker kill ecs-agent
       ExecStartPre=-/usr/bin/docker rm ecs-agent
       ExecStartPre=iptables -t nat -A PREROUTING -p tcp -d 169.254.170.2 --dport 80 -j DNAT --to-destination 127.0.0.1:51679
       ExecStartPre=iptables -t nat -A OUTPUT -d 169.254.170.2 -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 51679
       ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent:$${ECS_VERSION}
       ExecStart=/usr/bin/docker run --name ecs-agent \
                                     --volume=/var/run/docker.sock:/var/run/docker.sock \
                                     --volume=/var/log/ecs:/log \
                                     --volume=/var/ecs-data:/data \
                                     --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro \
                                     --volume=/run/docker/execdriver/native:/var/lib/docker/execdriver/native:ro \
                                     --publish=127.0.0.1:51678:51678 \
                                     --env=ECS_LOGFILE=/log/ecs-agent.log \
                                     --env=ECS_LOGLEVEL=$${ECS_LOGLEVEL} \
                                     --env=ECS_DATADIR=/data \
                                     --env=ECS_CLUSTER=$${ECS_CLUSTER} \
                                     --env=ECS_AVAILABLE_LOGGING_DRIVERS='["awslogs"]' \
                                     --env=ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE=true \
                                     --log-driver=awslogs \
                                     --log-opt awslogs-region=${aws_region} \
                                     --log-opt awslogs-group=${ecs_log_group_name} \
                                     amazon/amazon-ecs-agent:$${ECS_VERSION}
like image 237
depthfirstdesigner Avatar asked Oct 21 '25 04:10

depthfirstdesigner


1 Answers

if you have failure in this case, check 2 options.

  1. ECS execution role policy's permission. it should contains logs:CreateLogStream and logs:PutLogEvents. like :
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*"
        }
    ]
}
  1. you should configure ecs_agent's config for awslogs driver.

this config file path is /etc/ecs/ecs.config in host. this file should be like :

add awslogs driver into ecs.config

ECS_CLUSTER=test_ecs_cluster
ECS_AVAILABLE_LOGGING_DRIVERS=["awslogs","json-file"]

See :

Here's a document

like image 68
GNOKOHEAT Avatar answered Oct 22 '25 23:10

GNOKOHEAT