SO_REUSEPORT正解

最关键的就是多线程负载均衡，多个套接字可以在同一端口上，多个线程可以在各自的套接字上接受自己的数据。

Linux 3.9 加入了 SO_REUSEPORT 选项，可以提高 UDP 和 TCP server 的伸缩性，Linux 4.5/4.6 分别进一步改进了 UDP 和 TCP 的 SO_REUSEPORT 实现。本文以 UDP 的实现为例来讲解，TCP 与之类似。

UDP 协议的主要数据结构是两张 hash 表，指向 UDP 协议控制块 struct sock。其中 hash 以 port 为 key，hash2 以 IP+port 为 key。

link 查看udp_table数据结构

/**
 *	struct udp_table - UDP table
 *
 *	@hash:	hash table, sockets are hashed on (local port)
 *	@hash2:	hash table, sockets are hashed on (local port, local address)
 *	@mask:	number of slots in hash tables, minus 1
 *	@log:	log2(number of slots in hash table)
 */
struct udp_table {
	struct udp_hslot	*hash;
	struct udp_hslot	*hash2;
	unsigned int		mask;
	unsigned int		log;
};

link 查看udp_hslot数据结构

struct udp_hslot {
	struct hlist_head	head;
	int			count;
	spinlock_t		lock;
} __attribute__((aligned(2 * sizeof(long)));

link 查看sock_common，包括用于hash的skc_portaddr_node和skc_node。以及用于reuseport管理的sk_reuseport_cb。

在收到 UDP datagram 之后，从 hash 表中找到对应的 sock，代码位于 net/ipv4/udp.c : __udp4_lib_lookup，再把 datagram 放到 sock 的接收队列中。

link 查看__udp4_lib_lookup代码，发现调用了udp4_lib_lookup2，而udp4_lib_lookup2又调用了lookup_reuseport，后者调用了reuseport_select_sock。可看代码link

link 查看sock_reuseport数据结构，这里看到最后有个柔性数组，就很自然是管理一堆在reuseport上的sock

在启用 SO_REUSEPORT 之后，相同 port 的 sock 会加入同一个 struct sock_reuseport 对象，由它的socks管理。在收到 UDP datagram 之后，先找到任何一个 udp_sock，再找到对应的 sock_reuseport（也就是以在这个为准），然后根据地址四元组的哈希值来选择由哪个 sock 处理。

代码见reuseport_select_sock

以及reuseport_select_sock_by_hash

如果有 N 个 udp_sock，来自于多个客户端的 UDP datagram 会被均匀地分配给这些 sock 处理，同一个客户端的数据总是分配给同一个 sock。我们在写 UDP server 的时候，为了提高处理能力，可以起多个线程，每个线程读写自己的 UDP socket，这样比多个线程读写同一个 UDP socket 要少很多 contention。（值得一提的是，通过 dup(2) 复制 UDP socket 达不到 SO_REUSEPORT 的效果，因为这些 fd 会指向同一个 udp_sock，不会减少 contention。）

下面是一个邮件

commit e32ea7e747271a0abcd37e265005e97cc81d9df5
Author: Craig Gallek <kraig@google.com>
Date:   Mon Jan 4 17:41:46 2016 -0500

    soreuseport: fast reuseport UDP socket selection
    
    Include a struct sock_reuseport instance when a UDP socket binds to
    a specific address for the first time with the reuseport flag set.
    When selecting a socket for an incoming UDP packet, use the information
    available in sock_reuseport if present.
    
    This required adding an additional field to the UDP source address
    equality function to differentiate between exact and wildcard matches.
    The d matches when checking for
    existing port uses during bind.  The new use case of adding a socket
    to a reuseport group requires exact address matching.
    
    Performance test (using a machine with 2 CPU sockets and a total of
    48 cores):  Create reuseport groups of varying size.  Use one socket
    from this group per user thread (pinning each thread to a different
    core) calling recvmmsg in a tight loop.  Record number of messages
    received per second while saturating a 10G link.
      10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
      20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
      40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)
    
    This work is based off a similar implementation written by
    Ying Cai <ycai@google.com> for implementing policy-based reuseport
    selection.
    
    Signed-off-by: Craig Gallek <kraig@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

网络协议最新文章

使用Easyswoole 搭建简单的Websoket服务

常见的数据通信方式有哪些？

Openssl 1024bit RSA算法---公私钥获取和处

加:2021-08-09 10:33:55 更:2021-08-09 10:35:44

360图书馆购物三丰科技阅读网日历万年历 2025年8日历

-2025/8/6 5:59:42-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码