instance
- 用Prometheus術語來說,可以抓取的端點稱為執行個體 instance
job
- 具有相同目的的執行個體的集合(例如,出于可伸縮性或可靠性而複制的過程)稱為job
##
舉例
- job_name: 'pushgateway'
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- 172.20.70.205:9091
- 172.20.70.205:9092
- 172.20.70.215:9091
自動生成的标簽和時間序列
當Prometheus抓取目标時,它會自動在抓取的時間序列上附加一些标簽,以識别被抓取的目标:
- job:目标所屬的已配置作業名稱。
- instance:<host>:<port>抓取的目标網址的一部分。
- up{job="<job-name>", instance="<instance-id>"}:1執行個體是否正常(即可達)或0刮取失敗。
- - - 設定告警檢視采集失敗的執行個體 `up==0`
- scrape_duration_seconds{job="<job-name>", instance="<instance-id>"}:刮擦的耗時
-
舉例
scrape_duration_seconds{instance="172.20.70.205", job="blackbox-ssh"} 0.001817932
scrape_duration_seconds{instance="172.20.70.205:3000", job="single-targets"} 0.005416658
scrape_duration_seconds{instance="172.20.70.205:9091", job="pushgateway"} 0.002726714
scrape_duration_seconds{instance="172.20.70.205:9092", job="pushgateway"} 0.000506256
scrape_duration_seconds{instance="172.20.70.205:9100", job="single-targets"} 0.012790691
scrape_duration_seconds{instance="172.20.70.205:9104", job="single-targets"} 0.021421043
scrape_duration_seconds{instance="172.20.70.205:9115", job="blackbox-http-targets"} 0.00427973
用途:統計job中采集比較耗時的instance ,
- 為什麼慢
- 網絡品質
- metrics資料量太大
- prometheus采集端有瓶頸了,需要擴容
- 上次采集最慢的五個 job+instance topk(5,scrape_duration_seconds)
- 采集時間超過3秒的 scrape_duration_seconds > 3
- scrape_samples_post_metric_relabeling{job="<job-name>", instance="<instance-id>"}:relabel之後剩餘的重新标記後剩餘的樣本數
- 何為樣本:簡單了解就是 标簽組唯一
- scrape_samples_scraped{job="<job-name>", instance="<instance-id>"}:目标暴露的樣本數
舉例 topk(5,scrape_samples_scraped)
scrape_samples_scraped{instance="172.20.70.205:9256", job="single-targets"} 1691
scrape_samples_scraped{instance="172.20.70.215:9256", job="single-targets"} 1010
scrape_samples_scraped{instance="172.20.70.205:9104", job="single-targets"} 816
scrape_samples_scraped{instance="172.20.70.215:9100", job="single-targets"} 500
scrape_samples_scraped{instance="172.20.70.205:9100", job="single-targets"} 500
- 用途: 統計樣本數量按 job+instance分類
按job排序 topk(5,sum(scrape_samples_scraped) by (job))
{job="single-targets"} 4957
{job="redis_exporter_targets"} 299
{job="pushgateway"} 102
{job="blackbox-http-targets"} 72
{job="blackbox-ssh"} 6
- scrape_series_added{job="<job-name>", instance="<instance-id>"}:此抓取中新系列的大概數量。v2.10的新功能
- 用途 統計新增的metrics,可以用來檢視寫峰
- 大部分情況應該都是舊的metrics append寫入
#
prometheus特殊tag說明
- __address__ 采集endpoint的位址
- __name__ metrics 的名稱
- instance endpoint最後的tag
- job 任務
- __metrics_path__ 采集的http path 如 /metrics /cadvisor/metrics