又是填坑的一天
背景
突然有用户反馈,页面卡死,无法操作。这便是全部信息,让排查问题。排查过程是很困难的,直接说结论:前同事socket使用错误,导致内存占用过大,任何事件都得不到响应。
原代码
- 环境
flutter : 2.10.4 dart : 2.16.2 socket_io_client : 0.9.12 - 原代码
class SocketIoUtil {
static bool retryConnect = false;
static var messDate;
static Future socketIo() async {
retryConnect = true;
onConnect();
}
static Future dispose() async {
retryConnect = false;
socket?.disconnect();
socket = null;
messDate = null;
return null;
}
static Future onConnect() async {
print("socket:onConnect");
String connectUrl="http://www.xxx.com:1414";
socket = IO.io(
connectUrl, IO.OptionBuilder().setTransports(['websocket']).build());
socket.on(
"message",
(data) => {
onMessage(data.toString()),
});
socket.onDisconnect((data) => {
print("socket:连接断开"),
_retryConnectSocketIo(),
});
socket.onConnect((data) => {
print("socket:连接成功"),
});
socket.onConnectError((data) => {
print("socket:连接出错"),
_retryConnectSocketIo(),
});
}
static onMessage(String string) {
}
static _retryConnectSocketIo() {
if (retryConnect) {
print("socket:开启重新连接");
Future.delayed(Duration(seconds: 10), () {
onConnect();
});
}
}
}
分析
大概逻辑就是开启一个socket,连接成功则对接收到的消息进行业务处理,否则10s后重试连接。 看似没啥问题,但实测后打印日志如下:
问题解决
1.1 解决过度重试
从原代码可以看出在连接失败后会调用_retryConnectSocketIo方法,而该方法会在延迟10s后调用 onConnect 方法,但日志中显示在这延迟的10s中又多调用了3次 连接出错 ,这样在下一个10s后就会总共调用 4个onConnect 方法,而每个onConnect又会调用4次 连接出错,那么再过10s就会有4*4个 onConnect被调用。这样每个10s就会有4倍的socket连接,最终导致内存占用过大,项目卡死。
然而这些多余的连接出错不是项目触发的,因此怀疑创建的socket自身具有失败重试的功能。因此对代码进行如下修改:
...
static Future onConnect() async {
print("socket:onConnect");
String connectUrl="http://www.xxx.com:1414";
socket = IO.io(
connectUrl,
IO.OptionBuilder().setTransports(['websocket']).disableReconnection().build());
...
本以为问题得到解决,结果神奇的一幕发生了,看日志
2022-07-14 21:20:30.785 13742-13791/com.acewill.kvs_operation I/flutter: socket:onConnect
2022-07-14 21:20:30.914 13742-13791/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 21:20:30.914 13742-13791/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 21:20:40.924 13742-13791/com.acewill.kvs_operation I/flutter: socket:onConnect
后面就没日志了,onConnect 后面没有再打印 连接出错,也就是说再次运行至onConnect中创建的socket没有自动连接。
1.2 解决不自动重连
socket_io_client中的socket是自动连接的,而上面修改后的代码第二次进入就不再连接,抱着试一试的想法打印了下socket的hashcode:
2022-07-14 21:42:36.112 16057-16129/com.acewill.kvs_operation I/flutter: socket:onConnect
2022-07-14 21:42:36.192 16057-16129/com.acewill.kvs_operation I/flutter: socket:hashcode_726189657
2022-07-14 21:42:36.242 16057-16129/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 21:42:36.243 16057-16129/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 21:42:46.246 16057-16129/com.acewill.kvs_operation I/flutter: socket:onConnect
2022-07-14 21:42:46.247 16057-16129/com.acewill.kvs_operation I/flutter: socket:hashcode_726189657
...
竟然完全一致,说明虽然socket是在onConnect中创建的但依旧是原来的对象。那么这样就解释的通了: 第一次onConnect创建socket会调用自动连接,当再次进入onConnect后由于之前已经执行过了自动连接,因此这次什么都不做。 为什么socket会是同一个呢,明明是在onConnect中重新创建的?看下socket的创建代码:
Socket io(uri, [opts]) => _lookup(uri, opts);
Socket _lookup(uri, opts) {
...
if (newConnection) {
io = Manager(uri: uri, options: opts);
} else {
io = cache[id] ??= Manager(uri: uri, options: opts);
}
...
return io.socket(parsed.path.isEmpty ? '/' : parsed.path, opts);
}
Map<String, Socket> nsps;
Socket socket(String nsp, Map opts) {
var socket = nsps[nsp];
}
从上面代码可以看出当地址+端口号不变时,通过IO.io得到的是同一个socket。 原因找到了,解决方案就简单了,只需要将自动连接改为手动触发就好了,代码如下:
...
static Future onConnect() async {
print("socket:onConnect");
String connectUrl="http://www.xxx.com:1414";
socket = IO.io(connectUrl,
IO.OptionBuilder().setTransports(['websocket'])
.disableReconnection().disableAutoConnect().build());
...
socket.connect();
...
再试一次:
2022-07-14 22:14:34.384 17786-17877/com.acewill.kvs_operation I/flutter: socket:onConnect
2022-07-14 22:14:34.489 17786-17877/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 22:14:34.490 17786-17877/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 22:14:44.493 17786-17877/com.acewill.kvs_operation I/flutter: socket:onConnect
2022-07-14 22:14:44.539 17786-17877/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 22:14:44.540 17786-17877/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 22:14:44.540 17786-17877/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 22:14:44.541 17786-17877/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 22:14:54.543 17786-17877/com.acewill.kvs_operation I/flutter: socket:onConnect
2022-07-14 22:14:54.553 17786-17877/com.acewill.kvs_operation I/flutter: socket:onConnect
2022-07-14 22:14:54.574 17786-17877/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 22:14:54.575 17786-17877/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 22:14:54.576 17786-17877/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 22:14:54.577 17786-17877/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 22:14:54.577 17786-17877/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 22:14:54.578 17786-17877/com.acewill.kvs_operation I/flutter: socket:开启重新连接
2022-07-14 22:14:54.579 17786-17877/com.acewill.kvs_operation I/flutter: socket:连接出错
2022-07-14 22:14:54.579 17786-17877/com.acewill.kvs_operation I/flutter: socket:开启重新连接
...
卧槽!!! 居然还不行!!! 连接出错 几个字的打印频率是1、2、4…呈2的指数增长,这又该怎么解决呢?
1.3 再次解决过度重试
虽然上面依旧存在过度重试,但整体的重试时间点比较集中,似乎是有些代码在onConnect中重复执行了,逐行排查也只有socket.onConnectError这个代码重复执行了,看下内部实现:
void onConnectError(EventHandler handler) {
on('connect_error', handler);
}
void on(String event, EventHandler handler) {
this._events.putIfAbsent(event, () => new List<EventHandler>());
this._events[event].add(handler);
}
重新整理代码:
class SocketIoUtil {
static bool retryConnect = false;
static var messDate;
static Future socketIo() async {
retryConnect = true;
onConnect();
}
static IO.Socket createSocket(String url) {
var option = IO.OptionBuilder()
.setTransports(['websocket'])
.disableReconnection()
.disableAutoConnect()
.build();
IO.Socket socket = IO.io(url, option);
socket.on(
"message",
(data) => {
onMessage(data.toString()),
});
socket.onDisconnect((data) => {
print("连接断开 "),
EventBus().emit(Event.eventNet, '服务连接断开'),
_retryConnectSocketIo(),
});
socket.onConnect((data) => {
print("socketIo连接成功"),
socket.emit("join_group", ["refreshwake"]),
EventBus().emit(Event.eventNet, '网络状态良好'),
});
socket.onConnectError((data) => {
print("socket:连接出错"),
_retryConnectSocketIo(),
});
return socket;
}
static Future dispose() async {
retryConnect = false;
socket?.disconnect();
socket = null;
messDate = null;
return null;
}
static Future onConnect() async {
print("socket:onConnect");
String connectUrl="http://www.xxx.com:1414";
if (socket != null) {
if (socket.io.uri != connectUrl) {
dispose();
socket = createSocket(connectUrl);
}
} else {
socket = createSocket(connectUrl);
}
socket.connect();
}
static onMessage(String string) {
}
static _retryConnectSocketIo() {
if (retryConnect) {
print("socket:开启重新连接");
Future.delayed(Duration(seconds: 10), () {
onConnect();
});
}
}
}
上面代码运行正常,至此终于把这个坑填完。
总结
1. socket默认会自动重连 2. 当地址+端口号相同时,得到的是同一个socket 3. socket的监听的实现是add而不是set
|