一、背景介紹
如果我們的實體機有很多,不管是基于"file_sd_config"還是"kubernetes_sd_config",我們都需要手動寫入目标target及建立目标service,這樣才能被prometheus自動發現,為了避免重複性工作過多,是以我們可以使用Consul做注冊中心,讓所有的機器注冊至Consul,随後Prometheus再從Consul上來發現目标主機完成自動加入Prometheus的Target中。
二、Consul部署
Consul部署至K8s叢集内部,是以直接使用helm部署即可。
- 擷取consul的values
# 增加helm的repo
~] helm repo add hashicorp https://helm.releases.hashicorp.com
# 檢視repo是否添加成功
~] helm repo list
NAME URL
hashicorp https://helm.releases.hashicorp.com
# 擷取consul的values,具體參數含義可參考官網
# https://developer.hashicorp.com/consul/docs/k8s/helm
~] helm inspect values hashicorp/consul &> /gensee/k8s_system/consul/
~] vim /gensee/k8s_system/consul/values.yaml
server:
...
# 這裡是定義部署3個consul以實作consul叢集,保證資料高可用
replicas: 3
# 要與server.replicas的值一緻
bootstrapExpect: 3
# 定義每個節點使用多少的存儲空間存儲資料,預設100Gi
storage: 100Gi
# 定義storageclass的名稱,如果有的話
storageClass: managed-nfs-storage
...
- 建立名稱空間并安裝consul
~] kubectl create ns consul
~] helm install consul hashicorp/consul --namespace consul --values /gensee/k8s_system/consul/values.yaml
# 檢視consul安裝狀态
~] helm status consul -n consul
~] kubectl get all -n consul
NAME READY STATUS RESTARTS AGE
pod/consul-consul-connect-injector-89d7cf9d8-vqzgm 1/1 Running 7 3d23h
pod/consul-consul-server-0 1/1 Running 0 3d22h
pod/consul-consul-server-1 1/1 Running 0 3d22h
pod/consul-consul-server-2 1/1 Running 0 3d22h
pod/consul-consul-webhook-cert-manager-649d7486d7-x2pn5 1/1 Running 0 3d23h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/consul-consul-connect-injector ClusterIP 172.13.185.211 <none> 443/TCP 3d23h
service/consul-consul-dns ClusterIP 172.13.152.148 <none> 53/TCP,53/UDP 3d23h
service/consul-consul-server ClusterIP None <none> 8500/TCP,8502/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP,8600/UDP 3d23h
# 對外提供UI的svc
service/consul-consul-ui ClusterIP 172.4.246.29 <none> 80/TCP 3d23h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/consul-consul-connect-injector 1/1 1 1 3d23h
deployment.apps/consul-consul-webhook-cert-manager 1/1 1 1 3d23h
NAME DESIRED CURRENT READY AGE
replicaset.apps/consul-consul-connect-injector-89d7cf9d8 1 1 1 3d23h
replicaset.apps/consul-consul-webhook-cert-manager-649d7486d7 1 1 1 3d23h
NAME READY AGE
statefulset.apps/consul-consul-server 3/3 3d23h
三、Nginx代理Consul
由于Consul部署在K8s叢集内部,且叢集并沒有直接對外,是以使用nginx代理到叢集内部Consul以提供UI界面
server {
listen 80;
server_name xxx-xxx.xxxxx.xxx;
return 301 https://$host$request_uri;
}
server {
# For https
listen 443 ssl;
# listen [::]:443 ssl ipv6only=on;
ssl_certificate sslkey/server.cer;
ssl_certificate_key sslkey/server.key;
server_name xxx-xxx.xxxxx.xxx; # 替換成你的域名
access_log logs/consul-access.log;
error_log logs/consul-error.log;
location / {
# 這個svc是consul-ui
proxy_pass http://consul-consul-ui.consul.svc.cluster.local;
# 增加nginx自帶basic認證
auth_basic "Basic Authentication";
auth_basic_user_file "/etc/nginx/conf/system/.htpasswd";
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forward-For $proxy_add_x_forwarded_for;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
}
}
四、注冊至Consul
# id和tags的第二個字段是主機名,xxx.xx.x.xxx是主機的ip位址,其他保留
~] curl -u xxxxx:xxxxx -X PUT -d \
'{"id": "xxx-xxx-xxxx","name": "node_exporter","address": "xxx.xx.x.xxx","port": 9100,"tags":["prometheus","xxx-xxx-xxxx"],"checks": [{"http": "http://xxx.xx.x.xxx:9100/metrics", "interval": "5s"}]}' \
https://xxx-xxx.xxxxx.xxx/v1/agent/service/register
# 從consul中登出,node-exporter是id,如果登出失敗,說明第一次請求沒有到這台機器所在的consul節點,需要再次執行
~] curl -X PUT http://xxx-xxx.xxxxx.xxx/v1/agent/service/deregister/node-exporter
五、prometheus配置consul_sd_config
# 與此文檔(https://www.yuque.com/kkkfree/itbe1d/zzzgu0)中的第四.1一樣,修改prometheus-additional.yaml,在kubernetes_sd_configs下面添加
- job_name: 'consul-prometheus'
consul_sd_configs:
# 這是可以直接調用consul的svc了,不用寫ui了,因為都在K8s叢集中
- server: 'consul-consul-server.consul.svc.cluster.local:8500'
services:
# 指定擷取哪個service,我這裡因為機器都注冊到了node_exporter這個service,是以我隻抓取這個
- 'node_exporter'
relabel_configs:
# 将注冊上來的id标簽替換為hostname,看起來比較直覺
- source_labels: [__meta_consul_service_id]
action: replace
target_label: hostname
# 重載prometheus配置檔案
~] kubectl delete secret generic additional-configs -n monitoring
~] kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
# 手動使prometheus重新讀取配置
~] kubectl get pods -o wide -n monitoring
~] kubectl get pods -o wide -l app=prometheus -n monitoring
~] curl -X POST http://IP:9090/-/reload
六、效果圖
- consul的node_exporter的service中執行個體展示
- prometheus自動發現的endpoint