连接的每个方向定义了如下的3个值:
1)td_maxend为最大序号,其由反方向的回复报文计算而来,即最大确认序号SACK+窗口值。 2)td_maxwin为最大窗口,其由本方向的发送报文计算而来,即报文TCP头部通告的窗口值+乱序报文使用的空间值。 如果当前的报文结束序号超过最大序号td_maxend,需将超出部分累计到最大窗口值中。 3)td_end为结束序号,其由发送报文计算而来,即报文的开始序号加上数据长度。
连接跟踪的TCP序号检查,包含以下4个标准。
I. Upper bound for valid data: seq <= sender.td_maxend
有效SEQ数据上限(右边界):序号 <= sender.td_maxend,即序号值不能大于发送端最大的数据序号。
II. Lower bound for valid data: seq + len >= sender.td_end - receiver.td_maxwin
有效SEQ数据下限(左边界):序号 + 长度 >= sender.td_end - receiver.td_maxwin
III. Upper bound for valid (s)ack: sack <= receiver.td_end
有效ACK序号上限:sack <= receiver.td_end,既确认序号不能大于发送端发送的最大数据序号。
IV. Lower bound for valid (s)ack: sack >= receiver.td_end - MAXACKWINDOW
有效ACK序号下限:sack >= receiver.td_end - MAXACKWINDOW
对于第II个标准,数据的结束序号,应当大于接收端窗口的左边界。假设发送端的结束序号(sender.td_end)所表示的数据,致使接收端的窗口缓存满,以此作为接收端窗口的右边界,减去最大的接收端窗口,即为接收端窗口的左边界。
对于第IV个标准,要求ACK序号大于发送端窗口的左边界。假设接收端的结束序号(receiver.td_end)所表示的数据,致使发送端的窗口缓存满,以此作为发送端窗口的右边界,减去最大的发送端窗口就是左边界了。
宏MAXACKWINDOW将发送端最大窗口值td_maxwin限定在不小于MAXACKWINCONST(66000),但是,如果ACK确认的是一个大于66000的报文,标准IV将会出错。
/* Fixme: what about big packets? */
#define MAXACKWINCONST 66000
#define MAXACKWINDOW(sender) \
((sender)->td_maxwin > MAXACKWINCONST ? (sender)->td_maxwin \
: MAXACKWINCONST)
TCP序号检查
如下函数tcp_in_window对报文的序号,ACK序号进行检查,并更新相应窗口信息。
static bool tcp_in_window(const struct nf_conn *ct,
struct ip_ct_tcp *state,
enum ip_conntrack_dir dir,
unsigned int index,
const struct sk_buff *skb,
unsigned int dataoff,
const struct tcphdr *tcph)
{
struct net *net = nf_ct_net(ct);
struct nf_tcp_net *tn = nf_tcp_pernet(net);
struct ip_ct_tcp_state *sender = &state->seen[dir];
struct ip_ct_tcp_state *receiver = &state->seen[!dir];
const struct nf_conntrack_tuple *tuple = &ct->tuplehash[dir].tuple;
__u32 seq, ack, sack, end, win, swin;
u16 win_raw;
s32 receiver_offset;
bool res, in_recv_win;
首先,获取报文中的TCP序号、ACK序号、以及窗口大小,和结束序号(根据数据长度等计算得出)。
/*
* Get the required data from the packet.
*/
seq = ntohl(tcph->seq);
ack = sack = ntohl(tcph->ack_seq);
win_raw = ntohs(tcph->window);
win = win_raw;
end = segment_seq_plus_len(seq, skb->len, dataoff, tcph);
如果接收方向协商了SACK选项,获取SACK序号块中最大的序号值,备用。
if (receiver->flags & IP_CT_TCP_FLAG_SACK_PERM)
tcp_sack(skb, dataoff, tcph, &sack);
修正ACK序号偏移量,如果此连接执行了NAT,对于一些协议(FTP等)很可能需要调整ACK序号。当前的ACK序号调整,起因是反方向上调整过序号,因此查询反方向上的序号偏移量,据此修正当前方向的ACK序号。
/* Take into account NAT sequence number mangling */
receiver_offset = nf_ct_seq_offset(ct, !dir, ack - 1);
ack -= receiver_offset;
sack -= receiver_offset;
pr_debug("tcp_in_window: START\n");
pr_debug("tcp_in_window: ");
nf_ct_dump_tuple(tuple);
pr_debug("seq=%u ack=%u+(%d) sack=%u+(%d) win=%u end=%u\n",
seq, ack, receiver_offset, sack, receiver_offset, win, end);
pr_debug("tcp_in_window: sender end=%u maxend=%u maxwin=%u scale=%i "
"receiver end=%u maxend=%u maxwin=%u scale=%i\n",
sender->td_end, sender->td_maxend, sender->td_maxwin,
sender->td_scale,
receiver->td_end, receiver->td_maxend, receiver->td_maxwin,
receiver->td_scale);
以下处理发送端td_maxwin等于零的情况,表明CONNTRACK没有记录其最大窗口值,即没有收到过此发送者的报文。对于TCP握手的第一个报文,没有必要检查其序号的合法性,没有调用此函数处理。所以,报文如果设置了SYN标志,应当为SYN-ACK报文,或者两端同时发起握手的情况下,另一端的SYN报文。
1)将td_maxend初始化为当前报文的结束序号end。 2)将td_maxwin初始化为当前报文TCP头部中的窗口字段的值(最小为1)。 3)tcp_options函数主要解析报文TCP选项中的窗口系数,将其值赋予td_scale。另外,查看TCP选项中是否通告了SACK能力,设置标志IP_CT_TCP_FLAG_SACK_PERM(参见以上的使用)。 4)TCP两端必须同时通告窗口系数选项,否者,将两者的td_scale设置为零。
如果此报文为SYN报文,为设置ACK标志,为两端同时握手的情况,结束处理。否者,对于SYN-ACK报文,继续处理。
if (sender->td_maxwin == 0) {
/* Initialize sender data.
*/
if (tcph->syn) {
/* SYN-ACK in reply to a SYN
* or SYN from reply direction in simultaneous open.
*/
sender->td_end =
sender->td_maxend = end;
sender->td_maxwin = (win == 0 ? 1 : win);
tcp_options(skb, dataoff, tcph, sender);
/*
* RFC 1323:
* Both sides must send the Window Scale option
* to enable window scaling in either direction.
*/
if (!(sender->flags & IP_CT_TCP_FLAG_WINDOW_SCALE
&& receiver->flags & IP_CT_TCP_FLAG_WINDOW_SCALE))
sender->td_scale =
receiver->td_scale = 0;
if (!tcph->ack)
return true; /* Simultaneous open */
与以上情况相反,在报文未设置SYN标志的情况下,应为连接的中间报文。可能丢失了连接的SYN-ACK报文,以及随后的其它报文。只能由此报文获得发送端的相关信息了。
1)将当前报文的结束序号赋值给td_end; 2)将报文中的窗口值左移窗口系数,作为最大窗口值td_maxwin; 3)将结束序号与最大窗口值之和,作为最大序号td_maxend;
如果此报文的接收端最大窗口td_maxwin等于零,表明至此还没有看到接收端的任何报文。sack为发送端接收到的最大的序号值,将其赋值与接收端的td_end和td_maxend。否则,如果接收端td_maxwin不为零,并且sack等于接收端的最大序号+1,将接收端的最大序号td_end递增一,否则对于正常的keepalive报文,标准III通不过。
} else {
/* We are in the middle of a connection,
* its history is lost for us.
* Let's try to use the data from the packet.
*/
sender->td_end = end;
swin = win << sender->td_scale;
sender->td_maxwin = (swin == 0 ? 1 : swin);
sender->td_maxend = end + sender->td_maxwin;
if (receiver->td_maxwin == 0) {
/* We haven't seen traffic in the other
* direction yet but we have to tweak window
* tracking to pass III and IV until that happens.
*/
receiver->td_end = receiver->td_maxend = sack;
} else if (sack == receiver->td_end + 1) {
/* Likely a reply to a keepalive. Needed for III.
*/
receiver->td_end++;
}
}
如果当前报文发送端的td_maxwin不为零,表明相关连接信息已经初始化。但是,如果此报文为当前方向的第一个报文,判断方法为:对于原始方向,连接状态为SYN_SENT;对于回复方向,状态为SYN_RECV;表明TCP连接进行了重新初始化。此时,重新初始化发送端序号和窗口信息。
} else if (((state->state == TCP_CONNTRACK_SYN_SENT
&& dir == IP_CT_DIR_ORIGINAL)
|| (state->state == TCP_CONNTRACK_SYN_RECV
&& dir == IP_CT_DIR_REPLY))
&& after(end, sender->td_end)) {
/*
* RFC 793: "if a TCP is reinitialized ... then it need
* not wait at all; it must only be sure to use sequence
* numbers larger than those recently used."
*/
sender->td_end =
sender->td_maxend = end;
sender->td_maxwin = (win == 0 ? 1 : win);
tcp_options(skb, dataoff, tcph, sender);
}
如果报文TCP头部ACK标志未设置,假设ACK序号的值为接收端曾经发送的最大序号值。否则,报文设置了TCP的ACK标志,但是ACK序号值为零,并且设置了RST标志,同样假设ACK序号等于接收端曾经发送的最大序号值(CONNTRACK看到的最大序号值)。
if (!(tcph->ack)) {
/*
* If there is no ACK, just pretend it was set and OK.
*/
ack = sack = receiver->td_end;
} else if (((tcp_flag_word(tcph) & (TCP_FLAG_ACK|TCP_FLAG_RST)) ==
(TCP_FLAG_ACK|TCP_FLAG_RST))
&& (ack == 0)) {
/*
* Broken TCP stacks, that set ACK in RST packets as well
* with zero ack value.
*/
ack = sack = receiver->td_end;
}
对于回复SYN报文的RST报文,序号为零的情况下,将序号设置为记录的发送端结束序号。
if (tcph->rst && seq == 0 && state->state == TCP_CONNTRACK_SYN_SENT)
/* RST sent answering SYN.
*/
seq = end = sender->td_end;
pr_debug("tcp_in_window: ");
nf_ct_dump_tuple(tuple);
pr_debug("seq=%u ack=%u+(%d) sack=%u+(%d) win=%u end=%u\n",
seq, ack, receiver_offset, sack, receiver_offset, win, end);
pr_debug("tcp_in_window: sender end=%u maxend=%u maxwin=%u scale=%i "
"receiver end=%u maxend=%u maxwin=%u scale=%i\n",
sender->td_end, sender->td_maxend, sender->td_maxwin,
sender->td_scale,
receiver->td_end, receiver->td_maxend, receiver->td_maxwin,
receiver->td_scale);
如果接收端td_maxwin为零(未知),或者报文的结束序号大于接收端窗口的左边界,判断符合标准II。以下打印4个判定标准的结果。
/* Is the ending sequence in the receive window (if available)? */
in_recv_win = !receiver->td_maxwin ||
after(end, sender->td_end - receiver->td_maxwin - 1);
pr_debug("tcp_in_window: I=%i II=%i III=%i IV=%i\n",
before(seq, sender->td_maxend + 1),
(in_recv_win ? 1 : 0),
before(sack, receiver->td_end + 1),
after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1));
如果报文的序号符合4个标准,更新发送端和接收端的信息。
if (before(seq, sender->td_maxend + 1) &&
in_recv_win &&
before(sack, receiver->td_end + 1) &&
after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1)) {
/*
* Take into account window scaling (RFC 1323).
*/
if (!tcph->syn)
win <<= sender->td_scale;
发送端的最大窗口值,等于报文中通告的窗口值与(sack-ack)之和,其中最大的SACK序号减去当前报文ACK序号,表明发送端正常接收了这些乱序的数据,并且这些数据没有超出发送端窗口大小。
如果报文的结束序号大于记录的td_end,更新后者,这种情况下,发送了新的数据,这些数据接收端未确认。
如果发送端最大ACK序号没有设置,此处进行初始化。反之,如果曾经设置过,在当前报文的ACK序号大于设置的最大值时,将其更新。
/* Update sender data.
*/
swin = win + (sack - ack);
if (sender->td_maxwin < swin)
sender->td_maxwin = swin;
if (after(end, sender->td_end)) {
sender->td_end = end;
sender->flags |= IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
}
if (tcph->ack) {
if (!(sender->flags & IP_CT_TCP_FLAG_MAXACK_SET)) {
sender->td_maxack = ack;
sender->flags |= IP_CT_TCP_FLAG_MAXACK_SET;
} else if (after(ack, sender->td_maxack))
sender->td_maxack = ack;
}
接收端的最大窗口值td_maxwin不为零,并且报文的结束序号大于发送端最大序号td_maxend时,增加发送端最大窗口值,增加量为(end-sender->td_maxend),假定发送端遵照接收端的通告窗口,不会发送大于接收端窗口大小的数据。
接收端最大序号值,等于报文中最大序号与窗口之和,不能缩小。当窗口win等于零时,递增td_maxend的值,以防发送keepalive保活报文。
如果报文的ACK序号等于接收端的结束序号,表明数据都得到了确认。
/* Update receiver data.
*/
if (receiver->td_maxwin != 0 && after(end, sender->td_maxend))
receiver->td_maxwin += end - sender->td_maxend;
if (after(sack + win, receiver->td_maxend - 1)) {
receiver->td_maxend = sack + win;
if (win == 0)
receiver->td_maxend++;
}
if (ack == receiver->td_end)
receiver->flags &= ~IP_CT_TCP_FLAG_DATA_UNACKNOWLEDGED;
如果报文的方向、序号、ACK序号、结束序号(报文长度)和通过的窗口值,都是相同的,说明是重传报文。否者,更新记录值。
/* Check retransmissions.
*/
if (index == TCP_ACK_SET) {
if (state->last_dir == dir
&& state->last_seq == seq
&& state->last_ack == ack
&& state->last_end == end
&& state->last_win == win_raw)
state->retrans++;
else {
state->last_dir = dir;
state->last_seq = seq;
state->last_ack = ack;
state->last_end = end;
state->last_win = win_raw;
state->retrans = 0;
}
}
res = true;
最后,如果报文不合标准。打印出错信息。
} else {
res = false;
if (sender->flags & IP_CT_TCP_FLAG_BE_LIBERAL ||
tn->tcp_be_liberal)
res = true;
if (!res) {
nf_ct_l4proto_log_invalid(skb, ct,
"%s",
before(seq, sender->td_maxend + 1) ?
in_recv_win ?
before(sack, receiver->td_end + 1) ?
after(sack, receiver->td_end - MAXACKWINDOW(sender) - 1) ? "BUG"
: "ACK is under the lower bound (possible overly delayed ACK)"
: "ACK is over the upper bound (ACKed data not seen yet)"
: "SEQ is under the lower bound (already ACKed data retransmitted)"
: "SEQ is over the upper bound (over the window of the receiver)");
}
}
函数最后,打印发送端和接收端序号和窗口相关信息。
pr_debug("tcp_in_window: res=%u sender end=%u maxend=%u maxwin=%u "
"receiver end=%u maxend=%u maxwin=%u\n",
res, sender->td_end, sender->td_maxend, sender->td_maxwin,
receiver->td_end, receiver->td_maxend, receiver->td_maxwin);
return res;
TCP结束序号计算
如下函数segment_seq_plus_len,开始序号加上TCP数据的长度即为结束序号的值。具体为,报文的总长度减去TCP头部的偏移,再减去TCP报文的头部长度(包括选项)即为TCP数据的长度。另外SYN标志和FIN标志额外占用一个序号。
static inline __u32 segment_seq_plus_len(__u32 seq,
size_t len,
unsigned int dataoff,
const struct tcphdr *tcph)
{
/* XXX Should I use payload length field in IP/IPv6 header ?
* - YK */
return (seq + len - dataoff - tcph->doff*4
+ (tcph->syn ? 1 : 0) + (tcph->fin ? 1 : 0));
}
SACK最大序号
TCP头部最大60个字节,减去固定长度的tcphdr,其它为选项数据的最大长度。TCP的doff表示实际的头部长度,减去固定长度tcphdr,即为实际的选项数据的长度。由于报文通常只包含TIMESTAMP选项,这里优先处理这种情况。
static void tcp_sack(const struct sk_buff *skb, unsigned int dataoff,
const struct tcphdr *tcph, __u32 *sack)
{
unsigned char buff[(15 * 4) - sizeof(struct tcphdr)];
int length = (tcph->doff*4) - sizeof(struct tcphdr);
if (!length) return;
ptr = skb_header_pointer(skb, dataoff + sizeof(struct tcphdr), length, buff);
BUG_ON(ptr == NULL);
/* Fast path for timestamp-only option */
if (length == TCPOLEN_TSTAMP_ALIGNED
&& *(__be32 *)ptr == htonl((TCPOPT_NOP << 24)
| (TCPOPT_NOP << 16)
| (TCPOPT_TIMESTAMP << 8)
| TCPOLEN_TIMESTAMP))
return;
发现并遍历SACK选项数据,找到所有块中最大的序号,每个SACK块首先是开始序号,其次是结束序号。这里取每一块中的结束序号。
while (length > 0) {
int opcode = *ptr++;
switch (opcode) {
case TCPOPT_EOL:
return;
case TCPOPT_NOP: /* Ref: RFC 793 section 3.1 */
length--;
continue;
default:
if (length < 2)
return;
opsize = *ptr++;
if (opsize < 2) /* "silly options" */
return;
if (opsize > length)
return; /* don't parse partial options */
if (opcode == TCPOPT_SACK
&& opsize >= (TCPOLEN_SACK_BASE + TCPOLEN_SACK_PERBLOCK)
&& !((opsize - TCPOLEN_SACK_BASE) % TCPOLEN_SACK_PERBLOCK)) {
for (i = 0;
i < (opsize - TCPOLEN_SACK_BASE);
i += TCPOLEN_SACK_PERBLOCK) {
tmp = get_unaligned_be32((__be32 *)(ptr+i)+1);
if (after(tmp, *sack)) *sack = tmp;
}
return;
TCP窗口系数和SACK选项
在函数tcp_options中,检测SACK_PERM选项,并设置相应的标志位IP_CT_TCP_FLAG_SACK_PERM。以及获取窗口系数。
static void tcp_options(const struct sk_buff *skb, unsigned int dataoff,
const struct tcphdr *tcph, struct ip_ct_tcp_state *state)
{
unsigned char buff[(15 * 4) - sizeof(struct tcphdr)];
int length = (tcph->doff*4) - sizeof(struct tcphdr);
if (!length) return;
ptr = skb_header_pointer(skb, dataoff + sizeof(struct tcphdr), length, buff);
BUG_ON(ptr == NULL);
state->td_scale = state->flags = 0;
while (length > 0) {
int opcode=*ptr++;
switch (opcode) {
case TCPOPT_EOL:
return;
case TCPOPT_NOP: /* Ref: RFC 793 section 3.1 */
length--;
continue;
default:
if (length < 2)
return;
opsize=*ptr++;
if (opsize < 2) /* "silly options" */
return;
if (opsize > length)
return; /* don't parse partial options */
if (opcode == TCPOPT_SACK_PERM && opsize == TCPOLEN_SACK_PERM)
state->flags |= IP_CT_TCP_FLAG_SACK_PERM;
else if (opcode == TCPOPT_WINDOW && opsize == TCPOLEN_WINDOW) {
state->td_scale = *(u_int8_t *)ptr;
if (state->td_scale > TCP_MAX_WSCALE)
state->td_scale = TCP_MAX_WSCALE;
state->flags |= IP_CT_TCP_FLAG_WINDOW_SCALE;
TCP协议栈序号判断
在TCP协议栈中,由于有准确的接收窗口信息[s_win, e_win],有以下三个判断条件,报文序号等于窗口左边界;报文开始和结束序号和窗口有交集;最后,以上都不符合,只有序号等于右边界的无数据的ACK报文符合要求了。协议栈处理函数tcp_in_window只是检测了报文序号,并没有检测ACK序号。
static bool tcp_in_window(u32 seq, u32 end_seq, u32 s_win, u32 e_win)
{
if (seq == s_win)
return true;
if (after(end_seq, s_win) && before(seq, e_win))
return true;
return seq == e_win && seq == end_seq;
}
内核版本 5.10
|