Been stuck on this for a week.
So I have a fargate container with a service in a private subnet, i want to limit to containers access to the private network alone, but im not able to pull an image from my private ecr repo over the private network
When the container is launched, i get the following error:
CannotPullContainerError: ref pull has been retried 5 time(s): failed to copy: httpReadSeeker: failed open: failed to do request: Get 956469741060.dkr.ecr.us-east-1.amazonaws.com/my-ecr-repo:latest: dial tcp 52.216.78.32:443: i/o timeout
So the container is still trying to pull the ECR image over a public ip (my vpc cidr is 10.0.0.0/16). Needless to say that the fargate container is able to pull the ecr image once i open 0.0.0.0/0 for my fargate egress, but i want to avoid that and only allow ingress/egress to the private subnets.
I confirmed the VPC endpoints configuration by launching an ec2 instance in a private subnet, and ran nslookup on all VPC Endpoints that mentioned above, and all of them are returning private ips, so this tells me the endpoints are actually configured correctly
because of the ec2 nslookup test, i will assume that the issue is within my fargate configuration, this is how the terraform setup looks:
resource "aws_ecs_cluster" "test_sdk" {
name = "test-sdk-${var.stage}"
}
resource "aws_ecs_task_definition" "test_task_def" {
family = "test-sdk-${var.stage}"
network_mode = "awsvpc"
task_role_arn = aws_iam_role.ecs_task_execution_role.arn
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
requires_compatibilities = ["FARGATE"]
cpu = 4096
memory = 8192
container_definitions = jsonencode(
[
{
"name": "test-container",
"image": "${data.aws_caller_identity.self.account_id}.dkr.ecr.${var.region}.amazonaws.com/test-sdk-${var.stage}:latest",
"essential": true,
"portMappings": [
{
"containerPort": var.container_port,
"hostPort": var.container_port
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "ecs-test-${var.stage}",
"awslogs-region": "${var.region}",
"awslogs-stream-prefix": "streaming"
}
}
}
]
)
}
resource "aws_ecs_service" "test_service" {
name = "test-service"
cluster = aws_ecs_cluster.test_sdk.id
task_definition = aws_ecs_task_definition.test_task_def.arn
launch_type = "FARGATE"
desired_count = 1
network_configuration {
subnets = [data.aws_subnet.private-1.id, data.aws_subnet.private-2.id]
security_groups = [aws_security_group.test-sg.id]
}
load_balancer {
target_group_arn = aws_lb_target_group.test-tg.arn
container_name = "test-container"
container_port = var.container_port
}
}
# Create a security group allowing traffic on container port
resource "aws_security_group" "test-sg" {
name = "test-sg-${var.stage}"
vpc_id = data.aws_vpc.vpc.id
ingress {
from_port = var.container_port
to_port = var.container_port
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
] # Allow traffic from private subnet
}
egress {
from_port = var.container_port
to_port = var.container_port
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
] # Allow traffic from private subnet
}
egress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
] # Allow traffic from private subnet
}
}
# Create Application Load Balancer
resource "aws_lb" "test" {
name = "test-lb-${var.stage}"
internal = true
load_balancer_type = "application"
security_groups = [aws_security_group.test-sg.id]
subnets = [data.aws_subnet.private-1.id, data.aws_subnet.private-2.id]
}
# Create Target Group
resource "aws_lb_target_group" "test-tg" {
name = "test-tg-${var.stage}"
port = var.container_port
protocol = "HTTP"
target_type = "ip"
vpc_id = data.aws_vpc.vpc.id
health_check {
enabled = true
healthy_threshold = 2
interval = 90
path = "/"
matcher = "200-399"
port = var.container_port
protocol = "HTTP"
timeout = 40
unhealthy_threshold = 2
}
}
# Create listener
resource "aws_lb_listener" "test-listener" {
load_balancer_arn = aws_lb.test.arn
port = var.container_port
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.test-tg.arn
}
}
# IAM
resource "aws_iam_role" "ecs_task_execution_role" {
name = "tf-${var.project}-${var.stage}-ecs-task-execution-role"
assume_role_policy = data.aws_iam_policy_document.ecs_assume_role_policy.json
inline_policy {
name = "test-sdk-ecr-repo-policy"
policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogGroup",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:CreateLogStream",
"logs:PutLogEvents",
"secretsmanager:GetSecretValue",
"events:PutEvents"
],
"Resource" : "*"
}
}
data "aws_iam_policy_document" "ecs_assume_role_policy" {
statement {
actions = [
"sts:AssumeRole"
]
effect = "Allow"
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
If you are using an S3 gateway endpoint, it does not create a network interface on the VPC. So even though your security group allows traffic to your VPC, it won't work for fetching the image (which ECR stores in S3 under the covers) without some modification.
So as you mentioned in your comment, the solution is to add the prefix list ID that was created for S3 to the security group. Essentially, this is adding the S3 IP addresses as an allowlist for outbound communication.
This documentation outlines the details:

Also, this guide outlines the differences and pros/cons of using an S3 gateway endpoint versus an interface endpoint.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With