I am using kube-prometheus-stack and the yaml snippets you see below are part of a PrometheusRule definition.
This is a completely hypothetical scenario, the simplest one I could think of that illustrates my point.
Given this kind of metric:
cpu_usage{job="job-1", must_be_lower_than="50"} 33.72
cpu_usage{job="job-2", must_be_lower_than="80"} 56.89
# imagine there are plenty more lines here
# with various different values for the must_be_lower_than label
# ...
I'd like to have alerts that check the label must_be_lower_than and alert. Something like this (this doesn't work the way it's written now, just trying to demonstrate):
alert: CpuUsageTooHigh
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than }}% for 5 minutes.'
expr: cpu_usage > $must_be_lower_than
for: 5m
P.S I already know I can define alerts like this:
alert: CpuUsageTooHigh50
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above 50% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="50"} > 50
for: 5m
---
alert: CpuUsageTooHigh80
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above 80% for 5 minutes.'
expr: cpu_usage{must_be_lower_than="80"} > 80
for: 5m
This is not what I'm looking for, because I have to manually define alerts for some of the various values of the must_be_lower_than label.
See @markalex comment to this post, the absent() function can be used for generating metric with tags:
cpu_usage > ON(must_be_lower_than) GROUP_LEFT (absent(non_existent{must_be_lower_than="80"}) * 80 or absent(non_existent{must_be_lower_than="50"}) * 50)
old answer
There is currently no way in Prometheus to have this kind of "templating".
The only way to get something near would be to use recording rules that that define the maximum value for the label:
rules:
- record: max_cpu_usage
expr: vector(50)
labels:
must_be_lower_than:"50"
- record: max_cpu_usage
expr: vector(80)
labels:
must_be_lower_than:"80"
# ... other possible values
Then use it in your alerting rule:
alert: CpuUsageTooHigh
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above {{ $labels.must_be_lower_than}}% for 5 minutes.'
expr: cpu_usage > ON(must_be_lower_than) GROUP_LEFT max_cpu_usage
for: 5m
Prometheus is still (and I believe will always be) against mixing up labels and values. The only exception of this rule is method count_values that allows to convert value of metric into label, but that's it: no mechanisms to do the opposite are available.
Regarding your idea, I believe you trying to do it in a bit incorrect way. If you want to create alert for some of your metrics based on threshold specific to the machine this metrics are from, you should use additional metrics instead of additional labels.
I'm sorry I'm not that familiar with kube-stack, so I'll be using node exporter as an example, I believe this example should be easy to scale on another exporters.
So for your example, you should create a textfile metric
my_metric_threshold 80
configure node exporter to expose this textfile metric, and then use alert rule
alert: CpuUsageTooHigh
annotations:
message: 'On job {{ $labels.job }}, the cpu usage has been above {{ value }}% for 5 minutes.'
expr: my_metric_threshold > on(instance) my_metric
for: 5m
This way your thresholds are tied to your machine, and you don't need to reload Prometheus' config when you decided to add or change threshold.
Also, you can have more granularity with less hustle. To keep up with node exporter example you can use textfile metrics
cpu_must_be_lower_than{cpu="0"} 80
cpu_must_be_lower_than{cpu="1"} 50
and expression cpu_must_be_lower_than > on(instance, cpu) (100 - 100 * rate(node_cpu_seconds_total{mode="idle"})).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With