[大数据] 关于Postgres主从复制延迟监控的错误告警问题

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> 关于Postgres主从复制延迟监控的错误告警问题 -> 正文阅读

[大数据]关于Postgres主从复制延迟监控的错误告警问题

?在使用Prometheus监控的Postgres数据库时, 会发生主从复制延迟产生告警，但实际上数据库正常的问题，我们使用的exporter为https://github.com/prometheus-community/postgres_exporter，告警表达式为：

pg_replication_lag > 300

这个指标的说明如下：

# HELP pg_replication_lag Replication lag behind master in seconds
# TYPE pg_replication_lag gauge
pg_replication_lag{server=""}

但实际上，这个指标表示的是主从之间有多长时间未发生复制，从https://github.com/prometheus-community/postgres_exporter/blob/master/queries.yaml我们可以查到这个指标使用的SQL为：

SELECT
CASE 
  WHEN NOT pg_is_in_recovery() THEN 0 
  ELSE GREATEST(0, EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_TIMESTAMP()))) 
END 
AS LAG

官方文档对两个函数的说明如下：

Name	Return Type	Description
pg_is_in_recovery()	bool	True if recovery is still in progress.
pg_last_xact_replay_TIMESTAMP()	timestamp with time zone	Get time stamp of last transaction replayed during recovery. This is the time at which the commit or abort WAL record for that transaction was generated on the primary. If no transactions have been replayed during recovery, this function returns NULL. Otherwise, if recovery is still in progress this will increase monotonically. If recovery has completed then this value will remain static at the value of the last transaction applied during that recovery. When the server has been started normally without recovery the function returns NULL.

此SQL的结果为：主库为0，从库为当前时间与最后一次恢复事务时间之差。

所以一个特殊情况是，如主库中无事务提交，那么pg_last_xact_replay_TIMESTAMP()的值会保持不变，相对应的pg_replication_lag值则会不断增加，但并不代表主从复制发生故障。

因此，如果想要避免误告警，我们可以在主库创建一张测试表，每分钟更新表中的数据，保持数据库的活跃，这样如果发生告警才真正表示数据发生了严重故障。具体做法如下：

psql (11.7)
Type "help" for help.

postgres=# CREATE DATABASE test;
postgres=# \c test
postgres=# CREATE TABLE test(id INT PRIMARY KEY  NOT NULL DEFAULT 1,  time varchar(255));
postgres=# INSERT INTO "public"."test"("time") VALUES ('123');

配置定时任务：

# crontab -e
* * * * * time=`date`;/usr/pgsql-11/bin/psql -h localhost -p 18083 -d test -c "UPDATE public.test SET time = '${time}'"

阅读世界，共赴山海

423全民读书节，邀你共读

大数据最新文章

实现Kafka至少消费一次

亚马逊云科技：还在苦于ETL？Zero ETL的时代

初探MapReduce

【SpringBoot框架篇】32.基于注解+redis实现

Elasticsearch：如何减少 Elasticsearch 集

Go redis操作

Redis面试题

专题五 Redis高并发场景

基于GBase8s和Calcite的多数据源查询

Redis——底层数据结构原理

加:2022-04-23 10:53:19 更:2022-04-23 10:54:52

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/3 17:02:10-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码