记录规则应为一般形式 level:metric:operations。level 代表规则输出的聚合级别和标签。metric 是度量名称,使用rate() 或 irate()时,除了从计数器剥离 _total,应保持不变。_total off counters when using rate() or irate(). operations 是应用到度量标准的操作列表,最新操作优先。
保持数据指标名称不变,可以轻松了解一个数据指标是什么,也可以在代码库中轻松找到。
为了保持操作整洁,如果还有其他操作,则将_sum省略,如sum()。关联操作可以合并(如,min_min 与 min 相同).
- record: instance_path:requests:rate5m
expr: rate(requests_total{job="myjob"}[5m])
- record: path:requests:rate5m
expr: sum without (instance)(instance_path:requests:rate5m{job="myjob"})
计算请求失败率并汇总到作业级别失败率:
- record: instance_path:request_failures:rate5m
expr: rate(request_failures_total{job="myjob"}[5m])
- record: instance_path:request_failures_per_requests:ratio_rate5m
expr: |2
instance_path:request_failures:rate5m{job="myjob"}
/
instance_path:requests:rate5m{job="myjob"}
# Aggregate up numerator and denominator, then divide to get path-level ratio.
- record: path:request_failures_per_requests:ratio_rate5m
expr: |2
sum without (instance)(instance_path:request_failures:rate5m{job="myjob"})
/
sum without (instance)(instance_path:requests:rate5m{job="myjob"})
# No labels left from instrumentation or distinguishing instances,
# so we use 'job' as the level.
- record: job:request_failures_per_requests:ratio_rate5m
expr: |2
sum without (instance, path)(instance_path:request_failures:rate5m{job="myjob"})
/
sum without (instance, path)(instance_path:requests:rate5m{job="myjob"})
根据一个 Summary 类型的数据指标计算一段时间内的平均延迟:
- record: instance_path:request_latency_seconds_count:rate5m
expr: rate(request_latency_seconds_count{job="myjob"}[5m])
- record: instance_path:request_latency_seconds_sum:rate5m
expr: rate(request_latency_seconds_sum{job="myjob"}[5m])
- record: instance_path:request_latency_seconds:mean5m
expr: |2
instance_path:request_latency_seconds_sum:rate5m{job="myjob"}
/
instance_path:request_latency_seconds_count:rate5m{job="myjob"}
# Aggregate up numerator and denominator, then divide.
- record: path:request_latency_seconds:mean5m
expr: |2
sum without (instance)(instance_path:request_latency_seconds_sum:rate5m{job="myjob"})
/
sum without (instance)(instance_path:request_latency_seconds_count:rate5m{job="myjob"})
使用avg() 函数计算出不同 instance 和 path 的平均查询率:
- record: job:request_latency_seconds_count:avg_rate5m
expr: avg without (instance, path)(instance:request_latency_seconds_count:rate5m{job="myjob"})