1、点击create然后选择import 2、输入10477,然后点击load,如下 3、更改如下最后输入import 4、更改pemetheus.yml
- job_name: 'rocketmq'
file_sd_configs:
- files:
- prod/rocketmq.yml
5、更改prod/rocketmq.yml
- targets:
- '172.63.11.214:5557'
- '172.63.11.161:5557'
labels:
env: prod
project: datacenter_rocketmq
name: datacenter_rocketmq
cluster: datacenter_cluster
6、部署rocketmq-exporter
git clone https://github.com/apache/rocketmq-exporter
cd rocketmq-exporter
mvn clean install
其中要注意将pom.xml注释掉做更改后再操作 编译后包路径如下
ll -h /var/jenkins_home/repository/org/apache/rocketmq-exporter/0.0.2-SNAPSHOT/
total 31M
-rw-r--r-- 1 root root 713 Mar 7 17:53 maven-metadata-local.xml
-rw-r--r-- 1 root root 218 Mar 7 17:53 _remote.repositories
-rw-r--r-- 1 root root 31M Mar 7 17:53 rocketmq-exporter-0.0.2-SNAPSHOT.jar
-rw-r--r-- 1 root root 4.3K Mar 7 14:40 rocketmq-exporter-0.0.2-SNAPSHOT.pom
7、启动rocketmq-exporter
[Unit]
Description=rocketmq-exporter
After=network.target
[Service]
Type=simple
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=rocketmq-exporter
User=root
WorkingDirectory=/data/deploy/rocketmq-exporter
ExecStart=/usr/local/java/bin/java -jar /data/deploy/rocketmq-exporter/rocketmq-exporter-0.0.2-SNAPSHOT.jar
KillMode=process
TimeoutStopSec=60
Restart=on-failure
RestartSec=5
RemainAfterExit=no
[Install]
WantedBy=multi-user.target
8、最后我们在granfa上查看我们的实例图 9、配置监控告警 配置prometheus.yml
rule_files:
- rule/*.yml
配置rules_rocketmq.yml
cat rule/rules_rocketmq.yml
groups:
- name: rocketmq
rules:
- alert: RocketMQ Exporter is Down
expr: up{job="rocketmq"} == 0
for: 20s
labels:
severity: '灾难'
annotations:
summary: RocketMQ {{ $labels.instance }} is down
- alert: RocketMQ 存在消息积压
expr: (sum(irate(rocketmq_producer_offset[1m])) by (topic) - on(topic) group_right sum(irate(rocketmq_consumer_offset[1m])) by (group,topic)) > 5
for: 5m
labels:
severity: '警告'
annotations:
summary: RocketMQ (group={{ $labels.group }} topic={{ $labels.topic }})积压数 = {{ .Value }}
- alert: GroupGetLatencyByStoretime 消费组的消费延时时间过高
expr: rocketmq_group_get_latency_by_storetime/1000 > 10 and rate(rocketmq_group_get_latency_by_storetime[5m]) >0
for: 3m
labels:
severity: 警告
annotations:
description: 'consumer {{$labels.group}} on {{$labels.broker}}, {{$labels.topic}} consume time lag behind message store time
and (behind value is {{$value}}).'
summary: 消费组的消费延时时间过高
- alert: RocketMQClusterProduceHigh 集群TPS > 20
expr: sum(rocketmq_producer_tps) by (cluster) >= 20
for: 3m
labels:
severity: 警告
annotations:
description: '{{$labels.cluster}} Sending tps too high. now TPS = {{ .Value }}'
summary: cluster send tps too high
查看告警信息
另外的一个配置告警规则
groups:
- name: GaleraAlerts
rules:
- alert: RocketMQClusterProduceHigh
expr: sum(rocketmq_producer_tps) by (cluster) >= 10
for: 3m
labels:
severity: warning
annotations:
description: '{{$labels.cluster}} Sending tps too high.'
summary: cluster send tps too high
- alert: RocketMQClusterProduceLow
expr: sum(rocketmq_producer_tps) by (cluster) < 1
for: 3m
labels:
severity: warning
annotations:
description: '{{$labels.cluster}} Sending tps too low.'
summary: cluster send tps too low
- alert: RocketMQClusterConsumeHigh
expr: sum(rocketmq_consumer_tps) by (cluster) >= 10
for: 3m
labels:
severity: warning
annotations:
description: '{{$labels.cluster}} consuming tps too high.'
summary: cluster consume tps too high
- alert: RocketMQClusterConsumeLow
expr: sum(rocketmq_consumer_tps) by (cluster) < 1
for: 3m
labels:
severity: warning
annotations:
description: '{{$labels.cluster}} consuming tps too low.'
summary: cluster consume tps too low
- alert: ConsumerFallingBehind
expr: (sum(rocketmq_producer_offset) by (topic) - on(topic) group_right sum(rocketmq_consumer_offset) by (group,topic)) - ignoring(group) group_left sum (avg_over_time(rocketmq_producer_tps[5m])) by (topic)*5*60 > 0
for: 3m
labels:
severity: warning
annotations:
description: 'consumer {{$labels.group}} on {{$labels.topic}} lag behind
and is falling behind (behind value {{$value}}).'
summary: consumer lag behind
- alert: GroupGetLatencyByStoretime
expr: rocketmq_group_get_latency_by_storetime > 1000
for: 3m
labels:
severity: warning
annotations:
description: 'consumer {{$labels.group}} on {{$labels.broker}}, {{$labels.topic}} consume time lag behind message store time
and (behind value is {{$value}}).'
summary: message consumes time lag behind message store time too much
监控指标 含义
rocketmq_broker_tps broker 每秒生产消息数量
rocketmq_broker_qps broker 每秒消费消息数量
rocketmq_producer_tps 某个topic每秒生产的消息数量
rocketmq_producer_put_size 某个topic每秒生产的消息大小(字节)
rocketmq_producer_offset 某个topic的生产消息的进度
rocketmq_consumer_tps 某个消费组每秒消费的消息数量
rocketmq_consumer_get_size 某个消费组每秒消费的消息大小(字节)
rocketmq_consumer_offset 某个消费组的消费消息的进度
rocketmq_group_get_latency_by_storetime 某个消费组的消费延时时间
rocketmq_message_accumulation(rocketmq_producer_offset-rocketmq_consumer_offset) 消息堆积量(生产进度-消费进度)
rocketmq_message_accumulation 是一个聚合指标,需要根据其它上报指标聚合生成。
告警指标 含义
sum(rocketmq_producer_tps) by (cluster) >= 10 集群发送tps太高
sum(rocketmq_producer_tps) by (cluster) < 1 集群发送tps太低
sum(rocketmq_consumer_tps) by (cluster) >= 10 集群消费tps太高
sum(rocketmq_consumer_tps) by (cluster) < 1 集群消费tps太低
rocketmq_group_get_latency_by_storetime > 1000 集群消费延时告警
rocketmq_message_accumulation > value 消费堆积告警
|