Prometheus 記錄規則和警報規則

2023-06-23 20:19:52

前提環境：

Docker環境

涉及參考文檔：

Prometheus 錄制規則
Prometheus 警報規則

文法檢查規則

promtool check rules /path/to/example.rules.yml

一：錄制規則文法

groups 文法：

groups:
  [ - <rule_group> ]

rule_group 文法

# The name of the group. Must be unique within a file.
name: <string>

# How often rules in the group are evaluated.
[ interval: <duration> | default = global.evaluation_interval ]

# Limit the number of alerts an alerting rule and series a recording
# rule can produce. 0 is no limit.
[ limit: <int> | default = 0 ]

rules:
  [ - <rule> ... ]

rules 文法

# The name of the time series to output to. Must be a valid metric name.
record: <string>

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>

# Labels to add or overwrite before storing the result.
labels:
  [ <labelname>: <labelvalue> ]

示例規則檔案:

groups:
- name: cpu-node
  rules:
  - record: job_instance_mode:node_cpu_seconds:avg_rate5m
    expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total{instance="10.1.32.231"}[5m]))
    labels:
      job_instance_mode: node_cpu_seconds

Prometheus 記錄規則和警報規則

二：警報規則文法

警報規則允許您根據 Prometheus

自定義警報條件 表達式語言表達式和發送有關觸發警報的通知

到外部服務。

文法格式：

# The name of the alert. Must be a valid label value.
alert: <string>   # 告警名稱

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and all resultant time series become
# pending/firing alerts.
expr: <string>    # 自定義文法

# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
[ for: <duration> | default = 0s ]    # 持續設定時間才觸發，在此之間一直處于等待告警狀态（pending）

# Labels to add or overwrite for each alert.
labels:
  [ <labelname>: <tmpl_string> ]	  # 告警名稱标簽

# Annotations to add to each alert.
annotations:						  # 
  [ <labelname>: <tmpl_string> ]

定義警報規則：

标簽和注釋值

可以使用控制台進行模闆化模闆。該變量儲存警報執行個體的标簽鍵/值對。已配置的

可以通過變量通路外部标簽

。該變量儲存警報執行個體的評估值

groups:
- name: Dos端口探針
  rules:
  - alert: Dos端口探針		#告警名稱
    expr: probe_success{job="Dos-Port-Status"}==0   #比對規則
    for: 1m					# 一直持續時間，才觸發告警規則
    labels:					# 标簽部分
      severity: critical
      team: "{{ $labels.job }}" 	# $labels.job ——> Prometheus 主配置檔案定義的Job名稱
    annotations:			# 注解部分
      summary: '{{$labels.env}} TCP探測失敗' # 采集主機的标簽名稱
      description: '{{ $labels.env}}【{{ $labels.name}}】TCP探測端口失敗，目前狀态碼：{{$value}}' # 采集主機的标簽名稱

Prometheus 記錄規則和警報規則

觸發效果

Prometheus 記錄規則和警報規則

Prometheus 記錄規則和警報規則

一：錄制規則文法

二：警報規則文法

繼續閱讀

Shell程式設計——sort排序、uniq忽略重複、tr替換壓縮删除、cut指定删除字段、正規表達式元字元sort 指令uniq 指令tr 指令cut 指令正規表達式

Ubuntu14.04 LTS下安裝mongodb

Linxu常用指令技巧彙總

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

nginx 安裝錯誤資訊解決

Ambari介紹和架構原理

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三