问题
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:268)
at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.convertLettuceAccessException(LettuceKeyCommands.java:817)
at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.exists(LettuceKeyCommands.java:80)
at org.springframework.data.redis.connection.DefaultedRedisConnection.exists(DefaultedRedisConnection.java:55)
at org.springframework.data.redis.core.RedisTemplate.lambda$hasKey$6(RedisTemplate.java:770)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:225)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:185)
at org.springframework.data.redis.core.RedisTemplate.hasKey(RedisTemplate.java:770)
系统间歇性出现上述bug,异常信息很直白,redis 命令执行超时,出现此异常时,分别在客户端和服务器端查看网络连接(netstat -anp|grep 6379 ),发现了了两种情况, 第一种,客户端链接还在,服务器端链接不存在。 第二种,客户端和服务器端链接都在。 查阅资料得知,redis服务器有保活机制,默认5分钟发送一次心跳,以防止客户端意外中断,在/data/redis-${version}/redis.conf 中
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 0
# TCP keepalive.
#
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence
# of communication. This is useful for two reasons:
#
# 1) Detect dead peers.
# 2) Take the connection alive from the point of view of network
# equipment in the middle.
#
# On Linux, the specified value (in seconds) is the period used to send ACKs.
# Note that to close the connection the double of the time is needed.
# On other kernels the period depends on the kernel configuration.
#
# A reasonable value for this option is 300 seconds, which is the new
# Redis default starting with Redis 3.2.1.
tcp-keepalive 300
通过wireshark抓包发现,服务器端确实是每隔5分钟发送一次ACK,与配置文件是对应起来的。百度得知,很多人遇到过此问题,说是将lettuce换位jedis就能解决,至于为什么却没说!查看maven pom.xml文件
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>
而spring-boot-starter-data-redis确实是依赖了lettuce-core
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-redis</artifactId>
<version>2.1.15.RELEASE</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<artifactId>jcl-over-slf4j</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>io.lettuce</groupId>
<artifactId>lettuce-core</artifactId>
<version>5.1.8.RELEASE</version>
<scope>compile</scope>
</dependency>
解决
更改为jedis依赖
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-redis</artifactId>
<exclusions>
<exclusion>
<artifactId>lettuce-core</artifactId>
<groupId>io.lettuce</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
</dependency>
将lettuce更改为jedis后,抓包发现,确实有心跳机制,redis客户端定时向服务器发送心跳包。
原理
通过看源码找到jedis时的配置,
redis.clients.jedis.JedisPoolConfig
public class JedisPoolConfig extends GenericObjectPoolConfig {
public JedisPoolConfig() {
// defaults to make your life with connection pool easier :)
//在父类GenericObjectPoolConfig中默认值是false,lettuce时没有设置此字段使用了默认值false,所以lettuce没有心跳
setTestWhileIdle(true);
setMinEvictableIdleTimeMillis(60000);
setTimeBetweenEvictionRunsMillis(30000);
setNumTestsPerEvictionRun(-1);
}
}
上一步设置了 setTestWhileIdle(true),GenericObjectPool会定时校验链接合法性
org.apache.commons.pool2.impl.GenericObjectPool#evict
//因为上一步,这里的testWhileIdle是true
if (testWhileIdle) {
boolean active = false;
try {
factory.activateObject(underTest);
active = true;
} catch (final Exception e) {
destroy(underTest);
destroyedByEvictorCount.incrementAndGet();
}
if (active) {
//发送心跳,调用factory.validateObject方法
if (!factory.validateObject(underTest)) {
destroy(underTest);
destroyedByEvictorCount.incrementAndGet();
} else {
try {
factory.passivateObject(underTest);
} catch (final Exception e) {
destroy(underTest);
destroyedByEvictorCount.incrementAndGet();
}
}
}
}
jedis的factory就是redis.clients.jedis.JedisFactory
redis.clients.jedis.JedisFactory#validateObject
public boolean validateObject(PooledObject<Jedis> pooledJedis) {
final BinaryJedis jedis = pooledJedis.getObject();
try {
HostAndPort hostAndPort = this.hostAndPort.get();
String connectionHost = jedis.getClient().getHost();
int connectionPort = jedis.getClient().getPort();
return hostAndPort.getHost().equals(connectionHost)
&& hostAndPort.getPort() == connectionPort && jedis.isConnected()
//发送了PING并回复了PONG,则认为链接可用,wireshark抓包得知,客户端的确发送了PING,服务端恢复了PONG
&& jedis.ping().equals("PONG");
} catch (final Exception e) {
return false;
}
}
总结
至此真像大白,当网络出现波动,通信中间节点将链接桥段,因为lettuce没有心跳,链接变为死链了,但没有校验机制所以仍在连接池上,当使用到死链时,就会执行超时; jedis设置了testWhileIdle是true,会对空闲链接进行校验,如果发现死链则会从连接池中销毁,所以jedis不会出现问题。 至于开头提到的遇到死链的两种情况,
- 客户端有链接,服务器端没链接;
这种情况很有可能是因为网络波动,通信的中间节点掐断了链接,但客户端不知晓所以客户端的链接仍然在,而服务器端因为会向客户端发送心跳,所以服务器端把链接删掉了; - 客户端和服务器端都有链接;
这种情况跟上边一种情况原因是相同的,很有可能是因为通信的中间节点掐断了链接,但客户端没有心跳所以客户端的链接仍然在,而服务器端每隔5分钟发送一次心跳,可能还未到发送心跳时刻,所以服务器端也认为链接可用。当到了5分钟,服务器端发送心跳发现链接不可用时,关闭服务器端链接,就会变成第一种情况。
|