编程开源技术交流,分享技术与知识

网站首页 > 开源技术 正文

基于prometheus+grafana+alertmanager监控系统配置钉钉告警

wxchong 2024-07-08 23:38:09 开源技术 9 ℃ 0 评论

概述

因为目前工作基本都是用钉钉办公,所以今天主要介绍一下怎么在prometheus配置钉钉告警,这里的前提是已经部署了alertmanager。


一、配置go

由于Prometheus 是用golang开发的,所以首先安装一个go环境,Go语言是跨平台,支持Windows、Linux、Mac OS X等系统,还提供有源码,可编译安装。

下载地址:https://studygolang.com/dl

1、解压

# tar -xvf go1.13.linux-amd64.tar.gz -C /usr/local/

2、配置环境变量

echo "export PATH=$PATH:/usr/local/go/bin" >> /etc/profile
source /etc/profile

3、测试

验证一下是否成功,用go version 来验证

# go version

二、配置钉钉机器人

1、机器人管理

2、选择Webhook

3、选择群组

4、查看机器人设置


二、将钉钉接入 Prometheus AlertManager WebHook

插件下载地址:https://github.com/timonwong/prometheus-webhook-dingtalk

1、安装Webhook

--源码编译(注意在golang的src目录下新建)
mkdir -p /usr/local/go/src/github.com/timonwong/
cd /usr/local/go/src/github.com/timonwong/
git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
cd prometheus-webhook-dingtalk
make
--二进制包安装
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0
.linux-amd64.tar.gz

2、解压

# tar -xvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz

安装后会生成prometheus-webhook-dingtalk发送钉钉告警模版文件:

/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/default.tmpl

3、启动prometheus-webhook-dingtalk

nohup ./prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=de544211xxxx96f" >dingding.log 2>&1 &

5、配置系统服务

# vim /etc/systemd/system/prometheus-webhook-dingtalk.service
[Unit]
Description=prometheus-webhook-dingtalk
After=network-online.target
?
[Service]
Restart=on-failure
ExecStart=/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/prometheus-webhook-dingtalk --ding.profile=sre=https://oapi.dingtalk.com/robot/send?access_token=de544xxx8ebc04e8da096f
?
[Install]
WantedBy=multi-user.target
?
# chmod u+x /etc/systemd/system/prometheus-webhook-dingtalk.service
# systemctl daemon-reload
# systemctl start prometheus-webhook-dingtalk
# systemctl status prometheus-webhook-dingtalk

三、配置 alertmanager 的邮件发送方和对接钉钉 webhook

/usr/local/alertmanager/alertmanager.yml

global:
 resolve_timeout: 5m
 # 配置邮件发送方信息
 smtp_smarthost: 'smtp.qq.com:465'
 smtp_from: '1275758000@qq.com'
 smtp_auth_username: '1275758000@qq.com'
 smtp_auth_password: 'nxxxegb'
 smtp_require_tls: false
route:
 group_by: ['alertname', 'cluster', 'service']
 receiver: default-receiver
 group_wait: 30s
 group_interval: 2m
 repeat_interval: 30m
receivers:
 - name: 'default-receiver'
 email_configs:
 - to: '1430985018@qq.com,644642050@qq.com'
 # 配置连接 prometheus-webhook-dingtalk启动的服务
 webhook_configs:
 #ops_dingding是前面启动webhook所定义的值
 - url: 'http://localhost:8060/dingtalk/sre/send'
 send_resolved: true

repeat_interval: 这个字段是发送的频率,可以根据自己的需要进行设置,在调试过程中可以设置稍微短一点

查看状态:


四、prometheus配置(参考)

配置文件rules.yml:

groups:
 - name: host_monitoring
 rules:
 - alert: 内存报警
 expr: netdata_system_ram_MiB_average{chart="system.ram",dimension="free",family="ram"} < 800
 for: 2m
 labels:
 team: node
 annotations:
 Alert_type: 内存报警
 Server: '{{$labels.instance}}'
 #summary: "{{$labels.instance}}: High Memory usage detected"
 explain: "内存使用量超过90%,目前剩余量为:{{ $value }}M"
 #description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})"
 - alert: CPU报警
 expr: netdata_system_cpu_percentage_average{chart="system.cpu",dimension="idle",family="cpu"} < 20
 for: 2m
 labels:
 team: node
 annotations:
 Alert_type: CPU报警
 Server: '{{$labels.instance}}'
 explain: "CPU使用量超过80%,目前剩余量为:{{ $value }}"
 #summary: "{{$labels.instance}}: High CPU usage detected"
 #description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})"
 - alert: 磁盘报警
 expr: netdata_disk_space_GiB_average{chart="disk_space._",dimension="avail",family="/"} < 4
 for: 2m
 labels:
 team: node
 annotations:
 Alert_type: 磁盘报警
 Server: '{{$labels.instance}}'
 explain: "磁盘使用量超过90%,目前剩余量为:{{ $value }}G"
 - alert: 服务告警
 expr: up == 0
 for: 2m
 labels:
 team: node
 annotations:
 Alert_type: 服务报警
 Server: '{{$labels.instance}}'
 explain: "netdata服务已关闭"

这个配置文件是改过的,yaml文件对格式要求和其他文件不一样,具体的可以自己去看一下,改完之后可以检测一下自己的格式是否正确

这个是一个格式化工具,主要是可以检查一下你的文件是否正确

http://www.bejson.com/validators/yaml_editor/

五、查看告警

停止cadvisor:docker stop cadvisor

日志:

重启服务后:

好吧,就是告警模板有点丑,后面在做改进,先测试到这里。


后面会分享更多关于prometheus方面的内容,感兴趣的朋友可以关注下!

Tags:

本文暂时没有评论,来添加一个吧(●'◡'●)

欢迎 发表评论:

最近发表
标签列表