概述
因为目前工作基本都是用钉钉办公,所以今天主要介绍一下怎么在prometheus配置钉钉告警,这里的前提是已经部署了alertmanager。
一、配置go
由于Prometheus 是用golang开发的,所以首先安装一个go环境,Go语言是跨平台,支持Windows、Linux、Mac OS X等系统,还提供有源码,可编译安装。
下载地址:https://studygolang.com/dl
1、解压
# tar -xvf go1.13.linux-amd64.tar.gz -C /usr/local/
2、配置环境变量
echo "export PATH=$PATH:/usr/local/go/bin" >> /etc/profile source /etc/profile
3、测试
验证一下是否成功,用go version 来验证
# go version
二、配置钉钉机器人
1、机器人管理
2、选择Webhook
3、选择群组
4、查看机器人设置
二、将钉钉接入 Prometheus AlertManager WebHook
插件下载地址:https://github.com/timonwong/prometheus-webhook-dingtalk
1、安装Webhook
--源码编译(注意在golang的src目录下新建) mkdir -p /usr/local/go/src/github.com/timonwong/ cd /usr/local/go/src/github.com/timonwong/ git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git cd prometheus-webhook-dingtalk make --二进制包安装 wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v0.3.0/prometheus-webhook-dingtalk-0.3.0 .linux-amd64.tar.gz
2、解压
# tar -xvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
安装后会生成prometheus-webhook-dingtalk发送钉钉告警模版文件:
/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/default.tmpl
3、启动prometheus-webhook-dingtalk
nohup ./prometheus-webhook-dingtalk --ding.profile="ops_dingding=https://oapi.dingtalk.com/robot/send?access_token=de544211xxxx96f" >dingding.log 2>&1 &
5、配置系统服务
# vim /etc/systemd/system/prometheus-webhook-dingtalk.service [Unit] Description=prometheus-webhook-dingtalk After=network-online.target ? [Service] Restart=on-failure ExecStart=/usr/local/dingtalk/prometheus-webhook-dingtalk-0.3.0.linux-amd64/prometheus-webhook-dingtalk --ding.profile=sre=https://oapi.dingtalk.com/robot/send?access_token=de544xxx8ebc04e8da096f ? [Install] WantedBy=multi-user.target ? # chmod u+x /etc/systemd/system/prometheus-webhook-dingtalk.service # systemctl daemon-reload # systemctl start prometheus-webhook-dingtalk # systemctl status prometheus-webhook-dingtalk
三、配置 alertmanager 的邮件发送方和对接钉钉 webhook
/usr/local/alertmanager/alertmanager.yml
global: resolve_timeout: 5m # 配置邮件发送方信息 smtp_smarthost: 'smtp.qq.com:465' smtp_from: '1275758000@qq.com' smtp_auth_username: '1275758000@qq.com' smtp_auth_password: 'nxxxegb' smtp_require_tls: false route: group_by: ['alertname', 'cluster', 'service'] receiver: default-receiver group_wait: 30s group_interval: 2m repeat_interval: 30m receivers: - name: 'default-receiver' email_configs: - to: '1430985018@qq.com,644642050@qq.com' # 配置连接 prometheus-webhook-dingtalk启动的服务 webhook_configs: #ops_dingding是前面启动webhook所定义的值 - url: 'http://localhost:8060/dingtalk/sre/send' send_resolved: true
repeat_interval: 这个字段是发送的频率,可以根据自己的需要进行设置,在调试过程中可以设置稍微短一点
查看状态:
四、prometheus配置(参考)
配置文件rules.yml:
groups: - name: host_monitoring rules: - alert: 内存报警 expr: netdata_system_ram_MiB_average{chart="system.ram",dimension="free",family="ram"} < 800 for: 2m labels: team: node annotations: Alert_type: 内存报警 Server: '{{$labels.instance}}' #summary: "{{$labels.instance}}: High Memory usage detected" explain: "内存使用量超过90%,目前剩余量为:{{ $value }}M" #description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})" - alert: CPU报警 expr: netdata_system_cpu_percentage_average{chart="system.cpu",dimension="idle",family="cpu"} < 20 for: 2m labels: team: node annotations: Alert_type: CPU报警 Server: '{{$labels.instance}}' explain: "CPU使用量超过80%,目前剩余量为:{{ $value }}" #summary: "{{$labels.instance}}: High CPU usage detected" #description: "{{$labels.instance}}: CPU usage is above 80% (current value is: {{ $value }})" - alert: 磁盘报警 expr: netdata_disk_space_GiB_average{chart="disk_space._",dimension="avail",family="/"} < 4 for: 2m labels: team: node annotations: Alert_type: 磁盘报警 Server: '{{$labels.instance}}' explain: "磁盘使用量超过90%,目前剩余量为:{{ $value }}G" - alert: 服务告警 expr: up == 0 for: 2m labels: team: node annotations: Alert_type: 服务报警 Server: '{{$labels.instance}}' explain: "netdata服务已关闭"
这个配置文件是改过的,yaml文件对格式要求和其他文件不一样,具体的可以自己去看一下,改完之后可以检测一下自己的格式是否正确
这个是一个格式化工具,主要是可以检查一下你的文件是否正确
http://www.bejson.com/validators/yaml_editor/
五、查看告警
停止cadvisor:docker stop cadvisor
日志:
重启服务后:
好吧,就是告警模板有点丑,后面在做改进,先测试到这里。
后面会分享更多关于prometheus方面的内容,感兴趣的朋友可以关注下!
本文暂时没有评论,来添加一个吧(●'◡'●)