I am writing a simple monitoring script to which I would like to add disk space checks. I found however that the reported free space is different between the system df and shutils.disk_usage().
On a system which has three disks mounted:
# df / /mnt/2TB1 /mnt/1TB1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 472437724 231418380 216997128 52% /
/dev/sdb1 1921802520 1712163440 111947020 94% /mnt/2TB1
/dev/sdc1 960380648 347087300 564438888 39% /mnt/1TB1
# python3
Python 3.6.8 (default, Jan 14 2019, 11:02:34)
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> (t, u, f) = shutil.disk_usage('/')
>>> (t, u, f)
(483776229376, 236973805568, 222203674624)
>>> u/t
0.48984177224594366
>>> (t, u, f) = shutil.disk_usage('/mnt/2TB1')
>>> (t, u, f)
(1967925780480, 1753255362560, 114633748480)
>>> u/t
0.8909153891628782
>>> (t, u, f) = shutil.disk_usage('/mnt/1TB1')
>>> (t, u, f)
(983429783552, 355400192000, 578002624512)
>>> u/t
0.361388477290517
The difference is respectively 3%, 5% and 3%. Where does it come from and which result is the correct one?
As ChristiFati already pointed out, the ratios used / total are the same for both tools, but the Use% field reported by df differs from 100 · used / total.
As an example, lets examine the values for /dev/sda1 mounted on /.
df.total = 472437724
df.used = 231418380
df.available = 216997128
df.percentage = 52
shutil.total = 483776229376
shutil.used = 236973805568
shutil.free = 222203674624
df.used / df.total = 0.4898 = shutil.free / shutil.total
but …
df.used / df.total = 0.4898 ≠ 0.52 = df.percentage / 100
The source code of coreutils' df implementation sheds some light on this issue. The three lines 1171-1173 are relevant. pct is the percentage.
uintmax_t u100 = v->used * 100; uintmax_t nonroot_total = v->used + v->available; pct = u100 / nonroot_total + (u100 % nonroot_total != 0);
As we can see df does not compute used / total but used / (used + free). Note that used + free < total.
total includes space which is reserved for meta-data like where which file resides in the file system (depending on the file system this can include fat tables, inodes, …). Since you cannot use that space for regular files that space is excluded in the Use% by using (used + free) instead which does not include meta-data.
this cannot be the complete story. The following script generates a FAT12 and an ext2 file system inside a 2 MiB file. The script has to be executed using sudo.
#! /bin/bash
check() {
head -c 2MiB /dev/zero > fs
mkfs."$@" fs
mkdir fsmount
mount -o loop fs fsmount
df fsmount
umount fsmount
rm -r fs fsmount
}
echo fat12:
check fat -F 12
echo ext2:
check ext2
I got the output
fat12:
[...]
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 2028 0 2028 0% /tmp/fsmount
ext2:
[...]
Creating filesystem with 2048 1k blocks and 256 inodes
[...]
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 2011 21 1888 2% /tmp/fsmount
Note that both total sizes are smaller than the file system which is 2048 KiB = 2 MiB in both cases. Both file systems had no files at all, but for ext2 df reported 21 KiB as used (may be related to this question).
Python appears to have the correct results.
By default, [man7]: DF(1) (man df) displays numbers (sizes) in 1 KiB blocks. But, given the fact that the operation (division by 1024) is applied to both divider and divisor (when computing the percentage), it reduces itself, so it shouldn't have anything to do with the final result.
Example (for a certain dir):
df -B 1 (output in bytes)Run the following Python script:
import sys, shutil
path = sys.argv[1] if len(sys.argv) > 1 else "/"
t, u, f = shutil.disk_usage(path)
percent = 100 * u / t
print("(Python) - Volume name\t{:} {:} {:} {:.3f}% ({:.0f}) {:}".format(t, u, f, percent, percent, path))
[cfati@cfati-ubtu16x64-0:~]> for f in "/" "/media/sf_shared_00"; do echo df "${f}" && df ${f} && echo df -B 1 "${f}" && df -B 1 ${f} && echo Python script on "${f}" && python3 -c "import sys, shutil; path = sys.argv[1] if len(sys.argv) > 1 else \"/\"; t, u, f = shutil.disk_usage(path); percent = 100 * u / t; print(\"(Python) - Volume name\t{:} {:} {:} {:.3f}% ({:.0f}) {:}\".format(t, u, f, percent, percent, path))" ${f} && echo && echo; done df / Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/ubtu16x640_lvg0-ubtu16x640_root0 102067544 10999896 85859792 12% / df -B 1 / Filesystem 1B-blocks Used Available Use% Mounted on /dev/mapper/ubtu16x640_lvg0-ubtu16x640_root0 104517165056 11263893504 87920427008 12% / Python script on / (Python) - Volume name 104517165056 11263893504 87920427008 10.777% (11) / df /media/sf_shared_00 Filesystem 1K-blocks Used Available Use% Mounted on shared_00 327679996 155279796 172400200 48% /media/sf_shared_00 df -B 1 /media/sf_shared_00 Filesystem 1B-blocks Used Available Use% Mounted on shared_00 335544315904 159006511104 176537804800 48% /media/sf_shared_00 Python script on /media/sf_shared_00 (Python) - Volume name 335544315904 159006511104 176537804800 47.388% (47) /media/sf_shared_00
As seen, the numbers (sizes) from step #2. are identical to the ones from step #3.. Computing the percentage (in any of the 3 cases), the Python percentage seems to be the correct one.
It's unclear to me why df reports those percentages (didn't look in the source code), but it could be (everything that comes is pure speculation):
#pragma pack), the file will take 2 sectors (8 KiB), and therefore its underlying size will be greater than the reported oneIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With