We've been experiencing a long-standing networking issue. In short, one container cannot ping (or ssh) another. Does anybody have an extra moment to think along with me?
Our setup:
What we've tried so far:
This issue has largely stumped us. We've spent a lot of time on it and done much of the basic troubleshooting, and some more advanced troubleshooting (happy to elaborate). (But I don't expect that I've exhausted our options, so please don't hesitate to suggest anything you may think will work.) It's inconsistent (happening to different images, different nodes), intermittent, and long-standing (several months). We've made two changes, one of which was a workaround for MAC address assignment (explained here: https://github.com/docker/libnetwork/pull/2380; the actual workaround: https://github.com/systemd/systemd/issues/3374#issuecomment-452718898), which did improve the situation, including removing MAC address assignment errors from the logs. We also upgraded to get this fix (https://github.com/docker/libnetwork/pull/1935), which deals with IP reuse. This also decreased the problem (at the time, no containers could communicate). I've also run through some basics tests using the netshoot container (let me know if you want more info on that).
We have a workaround for a given container that is broken: we delete the Consul data for this container and then stop and restart it. From what I can tell, it does not seem to be an issue with the Consul data per se but instead comes from Docker/Swarm resetting several network configurations when the container is started (I can say more if this seems to trigger a thought for anybody reading). Then, the container can often ping other containers, but not always.
Specific question:
It seems like there's a window of time during which this can be worse. It's not necessarily tied to starting several containers at once, but there's a somewhat clear pattern: during some windows of time, containers do not get configured properly to communicate with each other. What troubleshooting steps come to mind for you?
The content below is the output from trying to ping one container (82afb0dccbcc) from two other containers. It fails at first, but then is successful.
The first time I try to ping the container, at 2019-12-10T23:57:52+00:00:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
82afb0dccbcc: user___92397089 crccheck/hello-world
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 0 received, 100% packet loss, time 3033ms^M
^M
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=2 ttl=64 time=0.083 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=3 ttl=64 time=0.072 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=4 ttl=64 time=0.073 ms^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 3 received, 25% packet loss, time 3023ms^M
rtt min/avg/max/mdev = 0.072/0.076/0.083/0.005 ms^M
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In this first ping test, above, we note that the packet loss from the first container is 100% and from the second container, it is 25%.
A few minutes later (2019-12-10T23:57:52+00:00), however, 82afb0dccbcc can be successfully pinged from both containers:
82afb0dccbcc: user___92397089 crccheck/hello-world
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ping from ansible-provisioner:
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=1 ttl=64 time=0.056 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=2 ttl=64 time=0.073 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=3 ttl=64 time=0.077 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=4 ttl=64 time=0.087 ms^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 4 received, 0% packet loss, time 3063ms^M
rtt min/avg/max/mdev = 0.056/0.073/0.087/0.012 ms^M
ping from ansible_container:
PING 82afb0dccbcc (172.24.0.165) 56(84) bytes of data.^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=1 ttl=64 time=0.055 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=2 ttl=64 time=0.055 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=3 ttl=64 time=0.060 ms^M
64 bytes from user___92397089.wharf (172.24.0.165): icmp_seq=4 ttl=64 time=0.085 ms^M
^M
--- 82afb0dccbcc ping statistics ---^M
4 packets transmitted, 4 received, 0% packet loss, time 3062ms^M
rtt min/avg/max/mdev = 0.055/0.063/0.085/0.015 ms^M
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
You need to create a network and connect both the containers to that network.
The Docker embedded DNS server enables name resolution for containers connected to a given network. This means that any connected container can ping another container on the same network by its container name.
From within container1, you can ping container2 by name.So, its important to explicitly specify names for the containers otherwise this would not work.
Create two containers:
docker run -d --name container1 -p 8001:80 test/apache-php
docker run -d --name container2 -p 8002:80 test/apache-php
Now create a network:
docker network create myNetwork
After that connect your containers to the network:
docker network connect myNetwork container1
docker network connect myNetwork container2
Check if your containers are part of the new network:
docker network inspect myNetwork
Now test the connection, you will be able to ping container2 from container1:
docker exec -ti container1 ping container2
I actually ran into this issue randomly, but in my case both containers were already on the same network so it was puzzling me why one container couldn't ping another.
until I ran docker network inspect myNetwork and randomly noticed that for some reason both containers were assigned the SAME mac address... no idea why that happened or even how. Obviously that would preclude pinging since on a LAN mac addresses are used by switching logic to route traffic.
I had to stop and remove the container then recreate it to change the mac address.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With