Given that:
docker users is ZFS;docker creates legacy datasets;Bash:
$ docker ps -a | wc -l
16
$ docker volume ls | wc -l
12
$ zfs list | grep legacy | wc -l
157
16 containers (both running and stopped). 12 volumes. 157 datasets. This seems like an awful lot of legacy datasets. I'm wondering if a lot of them are so orphaned that not even docker knows about them anymore, so they don't get cleaned up.
There is a huge list of legacy volumes in my Debian zfs pool. They started appearing when I started using Docker on this machine:
$ sudo zfs list | grep legacy | wc -l
486
They are all in the form of:
pool/var/<64-char-hash> 202K 6,18T 818M legacy
This location is used solely by docker.
$ docker info | grep -e Storage -e Dataset
Storage Driver: zfs
Parent Dataset: pool/var
I started cleaning up.
$ docker system prune -a
(...)
$ sudo zfs list | grep legacy | wc -l
154
That's better. However, I'm only running about 15 containers, and after running docker system prune -a, the history or every container shows that only the last image layer is still available. The rest are <missing> (because they are cleaned up).
$ docker images | wc -l
15
If all containers use only the last image layer after pruning the rest, shouldn't docker only use 15 image layers and 15 running containers, totalling 30 volumes?
$ sudo zfs list | grep legacy | wc -l
154
Can I find out if they are in use by a container/image? Is there a command that traverses all pool/var/<hash> datasets in ZFS and figures out to what docker container/image they belong? Either a lot of them can be removed, or I don't understand how to figure out (beyond just trusting docker system prune) they cannot.
The excessive use of zfs volumes by docker messes up my zfs list command, both visually and performance-wise. Listing zfs volumes now takes ~10 seconds in stead of <1.
$ docker ps -qa --no-trunc --filter "status=exited"
(no output)
$ docker images --filter "dangling=true" -q --no-trunc
(no output)
$ docker volume ls -qf dangling=true
(no output)
zfs list example:
NAME USED AVAIL REFER MOUNTPOINT
pool 11,8T 5,81T 128K /pool
pool/var 154G 5,81T 147G /mnt/var
pool/var/0028ab70abecb2e052d1b7ffc4fdccb74546350d33857894e22dcde2ed592c1c 1,43M 5,81T 1,42M legacy
pool/var/0028ab70abecb2e052d1b7ffc4fdccb74546350d33857894e22dcde2ed592c1c@211422332 10,7K - 1,42M -
# and 150 more of the last two with different hashes
I had the same question but couldn't find a satisfactory answer. Adding what I eventually found, since this question is one of the top search results.
The ZFS storage driver for Docker stores each layer of each image as a separate legacy dataset.
Even just a handful of images can result in a huge number of layers, each layer corresponding to a legacy ZFS dataset.
The base layer of an image is a ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top layer of the image it’s created from.
You can check the datasets used by one image by running:
$ docker image inspect [IMAGE_NAME]
Example output:
...
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:f2cb0ecef392f2a630fa1205b874ab2e2aedf96de04d0b8838e4e728e28142da",
...
...
...
"sha256:2e8cc9f5313f9555a4decca744655ed461e21fbe48a0f078ed5f7c4e5292ad2e",
]
},
...
This explains why you can see 150+ datasets created when only running a dozen containers.
Prune and delete unused images.
$ docker image prune -a
To avoid a slow zfs list, specify the dataset of interest.
Suppose you store docker in tank/docker and other files in tank/data. List only the data datasets by the recursive option:
# recursively list tank/data/*
$ zfs list tank/data -r
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With