匹配高性能的网络,tcp在高带宽和高延迟的情节下存在的问题。
2. 长肥管道
TCP performance problems arise when the bandwidth*delay product is
large. We refer to an Internet path operating in this region as a
"long, fat pipe", and a network containing this path as an "LFN"
(pronounced "elephan(t)")
3. 长肥管道的问题
3.1 窗口大小限制问题
tcp的窗口大小2字节(64k),对于长肥管道不够,通过tcp option来增加。
To circumvent this problem, Section 2 of this memo defines a
new TCP option, "Window Scale", to allow windows larger than
2**16. This option defines an implicit scale factor, which
is used to multiply the window size value found in a TCP
header to obtain the true window size.
This option may be sent in an initial <SYN> segment (i.e., a
segment with the SYN bit on and the ACK bit off). It may also
be sent in a <SYN,ACK> segment, but only if a Window Scale op-
tion was received in the initial <SYN> segment. A Window Scale
option in a segment without a SYN bit should be ignored.
The Window field in a SYN (i.e., a <SYN> or <SYN,ACK>) segment
itself is never scaled.
TCP Window Scale Option (WSopt):
Kind: 3 Length: 3 bytes
+---------+---------+---------+
| Kind=3 |Length=3 |shift.cnt|
+---------+---------+---------+
3.2 丢包恢复
tcp对于长肥管道来说,重传是个大问题,如果头部丢失,导致大量rto超时,这里需要解决。 快速恢复和快速重传解决一个包的丢失问题,超过1个包会导致重传超时和慢启动(这里是早期的拥塞控制算法)。 选择重传用来解决快速恢复/重传的多丢包问题。
Recently, the Fast Retransmit and Fast Recovery
algorithms [Jacobson90c] have been introduced. Their
combined effect is to recover from one packet loss per
window, without draining the pipeline. However, more than
one packet loss per window typically results in a
retransmission timeout and the resulting pipeline drain and
slow start.
To generalize the Fast Retransmit/Fast Recovery mechanism to
handle multiple packets dropped per window, selective
acknowledgments are required.
However, in the non-LFN
regime, selective acknowledgments reduce the number of
packets retransmitted but do not otherwise improve
performance, making their complexity of questionable value.
However, selective acknowledgments are expected to become
much more important in the LFN regime.
3.3 rtt的测量
长肥管道第三个问题是rtt的测量,长肥管道如果出现大量重传,rtt无法测试,ack确认后无法确认是第一次到达还是重传到达,在长时间的拥塞情况下,rtt无法更新,导致rto不准,影响重传问题。
4. 解决的思路
4.1 seq重复的解决思路
tcp的seq重复,影响tcp的可靠性。
Duplication of sequence numbers might happen in either of two
ways:
(1) Sequence number wrap-around on the current connection
A TCP sequence number contains 32 bits. At a high enough
transfer rate, the 32-bit sequence space may be "wrapped"
(cycled) within the time that a segment is delayed in queues.
(2) Earlier incarnation of the connection
Suppose that a connection terminates, either by a proper
close sequence or due to a host crash, and the same
connection (i.e., using the same pair of sockets) is
immediately reopened. A delayed segment from the terminated
connection could fall within the current window for the new
incarnation and be accepted as valid.
解决思路
问题1)出现环绕可能性 (the bandwidth B)
B > 16G/MSL (bps) = 16G/120 ~= 130M bps,
可以通过64位(这个不兼容之前的tcp)或者paws通过timestamp option来解决。
2**31 / B < MSL (secs) [1]
A possible fix for the problem of cycling the sequence space would
be to increase the size of the TCP sequence number field. For
example, the sequence number field (and also the acknowledgment
field) could be expanded to 64 bits. This could be done either by
changing the TCP header or by means of an additional option.
PAWS uses the TCP Timestamps option
defined in Section 4 to protect against old duplicates from the
same connection.
问题2)可以通过2msl来解决,这里如果不等待2msl,比如一些快速回收机制,
极端情况可能导致接收数据不是自身的数据问题
4.2 tcp timestamp option的问题
头部20字节增加12字节(1 kind + 1 length + 4 timestamp + 4 echo time + 2 align),这种overhead的收益要在大于减少重传比例的时候才有正收益!现在的tcp header已经越来越大了,如果网络不佳,这个option是可以去掉的。
5. rtt测量方法
统计学方法和rttm机制
A good RTT estimator with a conservative retransmission timeout
calculation can tolerate aliasing when the sampling frequency is
"close" to the data frequency. For example, with a window of 8
packets, the sample rate is 1/8 the data frequency -- less than an
order of magnitude different. However, when the window is tens or
hundreds of packets, the RTT estimator may be seriously in error,
resulting in spurious retransmissions.
Using TCP options, the sender places
a timestamp in each data segment, and the receiver reflects these
timestamps back in ACK segments. Then a single subtract gives the
sender an accurate RTT measurement for every ACK segment (which
will correspond to every other data segment, with a sensible
receiver). We call this the RTTM (Round-Trip Time Measurement)
mechanism.
A TSecr value received in a segment is used to update the
averaged RTT measurement only if the segment acknowledges
some new data, i.e., only if it advances the left edge of the
send window.
TCP Timestamps Option (TSopt):
Kind: 8
Length: 10 bytes
+-------+-------+---------------------+---------------------+
|Kind=8 | 10 | TS Value (TSval) |TS Echo Reply (TSecr)|
+-------+-------+---------------------+---------------------+
1 1 4 4
5.1 tcp应该回哪个包
1. delayed acks -- 最早未确认的包,这里rtt比实际的高
Many TCP's acknowledge only every Kth segment out of a group
of segments arriving within a short time interval; this
policy is known generally as "delayed ACKs". The data-sender
TCP must measure the effective RTT, including the additional
time due to delayed ACKs, or else it will retransmit
unnecessarily. Thus, when delayed ACKs are in use, the
receiver should reply with the TSval field from the earliest
unacknowledged segment.
2. A hole in the sequence space -- select ack 最晚的包,这里以最准确的rtt来回
The lost segment is probably a sign of congestion, and in
that situation the sender should be conservative about
retransmission. Furthermore, it is better to overestimate
than underestimate the RTT. An ACK for an out-of-order
segment should therefore contain the timestamp from the most
recent segment that advanced the window.
3. fill hole -- 必须回这段,最准确的来回
The segment that fills the hole represents the most recent
measurement of the network characteristics. On the other
hand, an RTT computed from an earlier segment would probably
include the sender's retransmit time-out, badly biasing the
sender's average RTT estimate. Thus, the timestamp from the
latest segment (which filled the hole) must be echoed.
5.2 PAWS: PROTECT AGAINST WRAPPED SEQUENCE NUMBERS ( 如何去重seq)
timestamp除了计算rtt,还用来检查包的合法性,时间戳的递增关系,用于丢包
|