一、问题
最近在写自定义规则节点,因为还没编译好thingsboard源代码,用的是docker搭建起来的环境。写好的规则节点后,要打包好扔到docker里,再重启docker。发现经常重启失败,报错的日志也都是这样:
2022-03-05 08:53:23,164 [main] ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Exception during pool initialization.
org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:303)
...
2022-03-05 08:53:25,187 [main] ERROR o.s.boot.SpringApplication - Application run failed
org.springframework.context.ApplicationContextException: Unable to start web server; nested exception is org.springframework.boot.web.server.WebServerException: Unable to start embedded Tomcat
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.onRefresh(ServletWebServerApplicationContext.java:161)
...
pg_ctl: could not open PID file "/data/db/postmaster.pid": Operation not permitted
二、原因
试过删除后重新run,也试过重新赋权限(好像第一次能成功),试过用docker-compose,结果都是一样。最后还是在issues里找到了解决方案,感谢alyf80提供的思路:
I think this is due to postgres not being properly shut down when the container is stopped; this leaves a stale postmaster.pid file in the data/db directory, which on the next run causes pg_ctl to try talking to a non-existing daemon and ultimately failing to start the db server. Deleting that file prior to starting the container reliably fixes the problem in my case.
I have no idea why this is happening. It looks to me that the TB process is ignoring (or simply not getting) the SIGTERM that Docker sends to initiate a graceful shutdown of the container, so after the default 10s shutdown timeout expires everything gets killed without stop-db.sh having had a chance to run.
大致意思:PostgreSQL在docker关闭时,可能没有正常接收到关闭指令,所以数据库进程一直在使用中,没有被释放,导致docker启动后无法连接上数据库
补充:应该是大多数时机都没关闭,偶尔才正常关闭,所以重试几次后,会有成功的
三、解决
既然是进程未释放原因,那就好办了,就在重启前,先把db给结束掉。这里提供多种方法(扩展思路,试过都OK)
$ docker exec -it mytb stop-db.sh
$ docker exec -it mytb killall postgres
$ docker exec -it mytb bash
$ killall postgres
根据这个原因分析,应该可以推断出修改权限后,为什么第一次成功了。修改权限后,db停止了,所以重启后也就正常了
|