I have the following expression for an alert in Prometheus:
absent_over_time(node_filesystem_size_bytes{device=~"/dev/nvme.*"}[5m]) > 0
As far as I can tell, this will be > 0 if there are no metrics for /dev/nvme* in the last 5 minutes.
However, I would like this to take into account the instance
that it runs on. That is, I want this to be triggered if any instance (preferably with a label saying which one(s)) is missing this metric over the last 5 minutes. I'm assuming that if one node is successfully scraped, this condition won't be true anymore.
There is an instance
label on the node_filesystem_size_bytes
metric, but if that metric is missing, I'm not sure how it could detect that.
Do I need to somehow get a list of instance
s at the current time and join it with node_filesystem_size_bytes
to accomplish this? Or is there some other way?
This is using node-exporter and the kube-prometheus-stack in a Kubernetes cluster.
Try the following query:
(node_filesystem_size_bytes{device=~"/dev/nvme.*"} offset 5m)
unless
node_filesystem_size_bytes{device=~"/dev/nvme.*"}
It should return time series matching the node_filesystem_size_bytes{device=~"/dev/nvme.*"}
with all their labels, which had raw samples 5 minutes ago, but have no new samples aftewards.
Note that this query can be simplified to the following one when using VictoriaMetrics - Prometheus-like system I work on:
lag(node_filesystem_size_bytes{device=~"/dev/nvme.*"}[10m]) > 5m
It uses the lag() function, which returns the duration since the last raw sample per every input time series.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With