1.2020-08-25
2020-08-25
Prometheus å®ç°é®ä»¶åè¦ï¼Prometheus+Alertmanager+QQé®ç®±æè ç½æé®ç®±ï¼ç®åæµè¯è¿è¿ä¸¤ç§é®ç®±é½å¯ä»¥åéåè¦é®ä»¶ï¼
Prometheuså®ç°é®ä»¶åè¦åçå¦ä¸ï¼
Prometheuså®æ¹æä¸ä¸ªé带çä¸é´ä»¶ï¼alertmanagerï¼éè¿è®¾ç½®rulesè§ååè·¯ç±è½¬åå¯ä»¥å®ç°é®ä»¶åè¦ï¼åææ¯ä½ éè¦æä¸ä¸ªå¯ä»¥åéé®ä»¶çé®ä»¶æå¡ç«¯ï¼å¯ä»¥èªå»ºæè 使ç¨äºèç½å ¬å¸æä¾çå è´¹é®ç®±ï¼
åè¦åçå¾
Prometheuså®æ´æ¶æå¾
æä¹åå¾åºçé误ç»è®ºå¦ä¸ï¼
æ¨èç´æ¥å¨èææºæä½ç³»ç»ä¸ç´æ¥å®è£ PrometheusåAlertmanagerï¼ä¸æ¨èå ¶ä¸ä»»ä½ä¸æ¹å¨å®¹å¨ä¸è¿è¡ï¼å 为æµè¯è¿å¨å®¹å¨ä¸è¿è¡Prometheusåalertmanagerï¼ç»æåºç°å¦ä¸é误æ åµ
第ä¸ç§æ åµæ¯ï¼æçnode-exporteræ线è·æºäºï¼æå¨å ³æºï¼æ¨¡æçªç¶æ线è·æºï¼ï¼Prometheuså´æ示èç¹ä¾ç¶å¨çº¿ï¼ææ¶åå´è½å¤æ£å¸¸æ¾ç¤ºèç¹æ线è·æºï¼çæåè¦åéé®ä»¶
第äºç§æ åµæ¯ï¼æçnode-exporteræ线è·æºäºï¼æå¨å ³æºï¼æ¨¡æçªç¶æ线è·æºï¼ï¼Prometheusæ示èç¹æ线ï¼åè¦çæï¼ä½æ¯æ²¡æåéé®ä»¶ï¼ææå¨æ¢å¤node-exporteråï¼åè¦è§£é¤ï¼é®ä»¶è½æ£å¸¸åéé®ä»¶æ示åè¦å·²ç»è§£é¤ãããã
第ä¸ç§æ åµæ¯ï¼æçnode-exporteræ线è·æºäºï¼æå¨å ³æºï¼æ¨¡æçªç¶æ线è·æºï¼ï¼Prometheusæ示èç¹æ线ï¼åè¦çæï¼æ£å¸¸æååéé®ä»¶ï¼ææå¨æ¢å¤node-exporteråï¼åè¦è§£é¤ï¼é®ä»¶æ²¡æåéåºæ¥ãããã
以ä¸ä¸ç§æ åµä¹åç»å¸¸åºç°ï¼å½æ¶ç¬¬ä¸æ¥ä»¥ä¸ºæ¯èªå·±è®¾ç½®çscrape_intervalä¸åç导è´çï¼ç»æè°è¯å 次ï¼é®é¢æ²¡æ解å³ï¼ç¬¬äºæ¥ä»¥ä¸ºæ¯èªå·±çæå¡å¨æ¶é´æ²¡æåå°ç²¾ç¡®åæ¥ï¼ç¶åæå»è®¾ç½®åé¿éäºçntpæå¡å¨åæ¥ï¼ç»æé®é¢ä¾ç¶æ²¡æ解å³ï¼ç¬¬ä¸æ¥ï¼æ¢ä¸ªæ¹åï¼æalertmanagerè¿ç§»å°èææºæä½ç³»ç»ä¸å®è£ è¿è¡ï¼é®é¢è§£å³ï¼
å京æ¶é´æ¯GMT+8å°æ¶ï¼æäºåå¿çæ¶é´å¯è½æ¯UTCçï¼ä½æ¯å¦ææ¯å¨è¦æ±ä¸å¤ªåå精确çæ åµä¸ï¼UTCæ¶é´æ¯åå好çäºGMTæ¶é´
为äºé¿å æ¶åºçæ··ä¹±ï¼prometheusææçç»ä»¶å é¨é½å¼ºå¶ä½¿ç¨Unixæ¶é´ï¼å¯¹å¤å±ç¤ºä½¿ç¨GMTæ¶é´ã
è¦æ¹æ¶åºæ两个åæ³
1 .ä¿®æ¹æºç ï¼éæ°ç¼è¯ã
2. ä½¿ç¨ docker è¿è¡ Prometheusï¼æè½½æ¬å°æ¶åºæ件
docker run --restart always -e TZ=Asia/Shanghai --hostname prometheus --name prometheus-server -d -p : -v /data/prometheus/server/data:/prometheus -v /data/prometheus/server/conf/prometheus.yml:/etc/prometheus/prometheus.yml -u root prom/prometheus:v2.5.0
æ£æå¼å§
å®è£ alertmanager
容å¨å®è£ æ¹å¼ï¼
docker run -d --name alertmanager -p : -v /usr/local/Prometheus/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml prom/alertmanager:latest
å å¨å®¿ä¸»æº/usr/local/Prometheusä¸å建ä¸ä¸ªæ件夹alertmanagerï¼ç¶åå¨æ件夹éå建alertmanager.ymlé ç½®æ件ï¼å¾ ä¼æè½æ å°å°alertmanager容å¨éç/etc/alertmanagerç®å½ä¸
globalï¼å ¨å±é ç½®
resolve_timeout: é®é¢è§£å³çè¶ æ¶æ¶é´
smtp_from: åéåè¦é®ä»¶çé®ç®±è´¦å·
smtp_smarthost: é®ç®± SMTP æå¡å°å,源码redis项目源码è¿éæ¯ä»¥QQé®ç®±ä¸ºä¾ï¼ä¹å¯ä»¥ç¨ç½æé®ç®±ï¼è¿ä¸ªåæä¹å设置zabbixé®ä»¶åè¦æ¶çé ç½®ä¸æ ·
smtp_auth_username: å¦æ没æ设置é®ç®±å«åï¼é£å°±æ¯è´¦æ·å
smtp_auth_password: é®ç®±çææç ï¼ä¸æ¯ è´¦æ·å¯ç ï¼ä½ å¯ä»¥å¨QQé®ç®±æè ç½æé®ç®±ç½é¡µç«¯è®¾ç½®ï¼å¼å¯ POP3/SMTP æå¡æ¶ä¼æ示ï¼åé ç½®zabbixé®ä»¶åè¦çæ¶åå ä¹ä¸æ ·
smtp_require_tls: æ¯å¦ä½¿ç¨ tlsï¼æ ¹æ®ç¯å¢ä¸åï¼æ¥éæ©å¼å¯åå ³éãå¦ææ示æ¥é email.loginAuth failed: Must issue a STARTTLS command firstï¼é£ä¹å°±éè¦è®¾ç½®ä¸º trueãçé说æä¸ä¸ï¼å¦æå¼å¯äº tlsï¼æ示æ¥é starttls failed: x: certificate signed by unknown authorityï¼éè¦å¨ email_configs ä¸é ç½® insecure_skip_verify: true æ¥è·³è¿ tls éªè¯ã
templatesï¼ åè¦æ¨¡æ¿ç®å½ï¼å¯ä»¥ä¸ç¼å模æ¿ï¼æé»è®¤æ¨¡æ¿
Subject: '{ { template "email.default.subject" . }}'
html: '{ { template "email.default.html" . }}'
routeï¼æ¥è¦çåå设置
group_byï¼åç»
group_wait: åç»çå¾ æ¶é´
group_interval: 5m æ¯ç»æ¶é´é´é
repeat_interval: m éå¤é´é
receiver: æ¥æ¶æ¹å¼ï¼è¯·æ³¨æï¼è¿éçååè¦å¯¹åºä¸é¢receiversä¸çä»»ä½ä¸ä¸ªååï¼ä¸ç¶ä¼æ¥éï¼è¿éå ¶å®å°±æ¯éæ©æ¹å¼ï¼æé®ç®±ï¼ä¼ä¸å¾®ä¿¡ï¼wehookï¼victoropsçç
receiversï¼æ¥åæ¹å¼æ±æ»ï¼å³åè¦æ¹å¼æ±æ»
ä¾åï¼
receivers:
- name:'default-receiver'
email_configs:
- to:'whiiip@.com'
html: '{ { template "alert.html" . }}'
headers: { Subject: "[WARN] æ¥è¦é®ä»¶test"}
inhibit_rules: æå¶è§å
å½åå¨ä¸å¦ä¸ç»å¹é çè¦æ¥ï¼æºï¼æ¶ï¼æå¶è§åå°ç¦ç¨ä¸ä¸ç»å¹é çè¦æ¥ï¼ç®æ ï¼ã
å æ¬æºå¹é åç®æ å¹é
alertmanagerå®æ¹æ¯è¿æ ·è¯´ç
Inhibition
Inhibition is a concept of suppressing notifications for certain alerts if certain other alerts are already firing.
Example: An alert is firing that informs that an entire cluster is not reachable. Alertmanager can be configured to mute all other alerts concerning this cluster if that particular alert is firing. This prevents notifications for hundreds or thousands of firing alerts that are unrelated to the actual issue.
Inhibitions are configured through the Alertmanager's configuration file.
å½åå¨ä¸å¦ä¸ç»å¹é å¨å¹é çè¦æ¥ï¼æºï¼æ¶ï¼ç¦æ¢è§åä¼ä½¿ä¸ä¸ç»å¹é å¨å¹é çè¦æ¥ï¼ç®æ ï¼éé³ãç®æ è¦æ¥åæºè¦æ¥çequalå表ä¸çæ ç¾å称é½å¿ é¡»å ·æç¸åçæ ç¾å¼ã
å¨è¯ä¹ä¸ï¼ç¼ºå°æ ç¾å带æ空å¼çæ ç¾æ¯åä¸ä»¶äºãå æ¤ï¼å¦æequalæºè¦æ¥åç®æ è¦æ¥é½ç¼ºå°ååºçæææ ç¾å称ï¼åå°åºç¨ç¦æ¢è§åã
为äºé²æ¢è¦æ¥ç¦æ¢èªèº«ï¼ä¸è§åçç®æ åæºç«¯ é½ å¹é çè¦æ¥ä¸è½è¢«è¦æ¥ï¼å æ¬å ¶æ¬èº«ï¼ä¸ºçæ¥ç¦æ¢ãä½æ¯ï¼æ们建议éæ©ç®æ å¹é å¨åæºå¹é å¨ï¼ä»¥ä½¿è¦æ¥æ°¸è¿ä¸ä¼åæ¶å¹é åæ¹ãè¿å¾å®¹æè¿è¡æ¨çï¼å¹¶ä¸ä¸ä¼è§¦åæ¤ç¹æ®æ åµã
æ¥çæ¯è§årules
ä¸è§£éäºï¼èªå·±ç 究å®æ¹ææ¡£
alertmanagerçé容å¨å®è£ æ¹å¼æ¯
wget /prometheus/alertmanager/releases/download/v0..0/alertmanager-0..0.linux-amd.tar.gz
tar xf alertmanager-0..0.linux-amd.tar.gz
mv alertmanager-0..0.linux-amd /usr/local/alertmanager
vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
Documentation=/prometheus/alertmanager
After=network.target
[Service]
Type=simple
User=root
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
Alertmanager å®è£ ç®å½ä¸é»è®¤æ alertmanager.yml é ç½®æ件ï¼å¯ä»¥å建æ°çé ç½®æ件ï¼å¨å¯å¨æ¶æå®å³å¯ã
å ¶ä½æ¹å¼åä¸é¢ä¸æ ·
æ¥çæ¯Prometheusï¼æä¹åçå客éæåäºå®¹å¨å®è£ åé容å¨å®è£ çæ¹æ³ï¼èªå·±å»ç¿»é
ç¶åæ¯å¨prometheus.ymléä¿®æ¹ç¸å ³é ç½®
é¦å å»æalertmanagerç注éï¼æ¹æIPå ä½ è®¾ç½®ç端å£å·ï¼é»è®¤æ¯
æ¥çå¨rule_files: ä¸é¢åä¸è§åæ件çç»å¯¹è·¯å¾ï¼å¯ä»¥æ¯å ·ä½æ件åï¼ä¹å¯ä»¥æ¯*ï¼ä¹å¯ä»¥åå 级æ件ï¼*é»è®¤æ¯å ¨é¨å¹é
æ¥çæ¯è¢«çæ§é¡¹ç设置ï¼è¿é设置å®æå¯ä»¥å¨Prometheusç½é¡µéçtargetséçå¾å°
请注æï¼è¿é设置çåæ°ååè¦åruleè§åä¸è®¾ç½®çåæ°ååä¸æ¨¡ä¸æ ·ï¼å¦åä½ çprometheusæå¡ä¼æ æ³å¯å¨ï¼ç¶åæ¥é
å¦æä¸å¨ç¹å®çjobä¸è®¾ç½®scrape_intervalï¼ä¼å 级é«äºå ¨å±ï¼,åé»è®¤éç¨gobalä¸çscrape_interval
æå模æèç¹æ线ï¼æå¨å ³énode-exporteræè Cadvisor
docker stop node-exporter æè 容å¨ID
docker stop cadvisor æè 容å¨ID
æè æup{ { job='prometheus'}} == 1 设置æ1ï¼åå设置ï¼ä¸ç¨å ³ææå¡ï¼å°±å¯ä»¥ççåè¦æä¸æå
说æä¸ä¸ Prometheus Alert åè¦ç¶ææä¸ç§ç¶æï¼InactiveãPendingãFiringã
Inactiveï¼éæ´»å¨ç¶æï¼è¡¨ç¤ºæ£å¨çæ§ï¼ä½æ¯è¿æªæä»»ä½è¦æ¥è§¦åã
Pendingï¼è¡¨ç¤ºè¿ä¸ªè¦æ¥å¿ 须被触åãç±äºè¦æ¥å¯ä»¥è¢«åç»ãåæ/æå¶æéé»/éé³ï¼æ以çå¾ éªè¯ï¼ä¸æ¦ææçéªè¯é½éè¿ï¼åå°è½¬å° Firing ç¶æã
Firingï¼å°è¦æ¥åéå° AlertManagerï¼å®å°æç §é ç½®å°è¦æ¥çåéç»æææ¥æ¶è ãä¸æ¦è¦æ¥è§£é¤ï¼åå°ç¶æè½¬å° Inactiveï¼å¦æ¤å¾ªç¯ã
没æé ç½®åè¦æ¨¡æ¿æ¶çé»è®¤åè¦æ ¼å¼æ¯è¿æ ·ç
èç¹æ¢å¤åé®ä»¶åç¥æ¯è¿æ ·ç
åäºæ¨¡æ¿åæ¯è¿æ ·ç
è¿è¦éæ°æ å°æ¨¡æ¿æ件夹路å¾å°alertmanager容å¨éçç¸å¯¹è·¯å¾ï¼ç¶åéå¯alertmanagerï¼å½ç¶ï¼å¦æç®å½ä¸æ²¡æ模æ¿æ件ï¼åä¸æ¾ç¤º
åè¦æ¨¡æ¿
å¨alertmanager.ymlä¸ä¿®æ¹ç¸å ³è®¾ç½®
éå¯alertmanager
docker restart alertmanager
æç»ææä¸æ¯å¾å¥½