FedAvg & ASYNCHRONOUS FEDERATED OPTIMIZATION

FedAvg

同步更新方案：

? There is a fixed set of K clients, each with a fixed local dataset.

? At the beginning of each round,a random fraction C of clients is selected, and the server sends the current global algorithm state to each of these
clients (e.g., the current model parameters). We only select a fraction of clients for efficiency, as our experiments show diminishing returns for adding more clients beyond a certain point.

? Each selected client then performs local computation based on the global state and its local dataset, and sends an update to the server.

? The server then applies these updates to its global state, and the process repeats.

引入SGD

最近大量深度学习的成功应用几乎完全依赖于随机梯度下降的变体进行优化；
许多进展可以理解为通过调整模型的结构（和损失函数），使其更易于使用简单的基于梯度的方法进行优化。
因此，我们很自然地从SGD开始建立联邦优化算法

FedAVG计算开销由三个关键参数控制:

C：每轮通信，执行计算的客户端比例，C=1表示所有用户均参与；
E：每轮通信，每个客户对其本地数据集进行的训练次数；
B：用于客户端更新的本地小批量大小。B=∞表示整个本地数据集被视为单个小批次。

基线算法——FedSGD = FedAVG|B=∞,E=1

SGD可以直接应用于联邦优化问题，即每轮通信在随机选择的客户端上进行一次批处理梯度计算
每个客户端使用其本地数据在当前模型上进行一步梯度下降，然后服务器对结果模型进行加权平均。
↓
FedAVG
通过在平均模型之前多次迭代本地更新来为每个客户端增加更多的计算。

FedAVG算法

在这里插入图片描述

模型从相同的初始化开始很重要

在这里插入图片描述
实验说明：
? 使用θw+(1-θ)w’对两个模型w和w’的参数求平均值，研究不同θ时整个MNIST训练集的损失。
? 左图，w和w’使用不同的随机种子初始化；
? 右图，两个模型从相同的初始化开始。
? 水平线给出了由w或w0实现的最佳损耗。

结论：
右图：当从相同初始化开始两个模型，然后在不同数据子集上独立训练每个模型时，发现参数平均的效果出奇地好

FedAsync

同步联邦优化的问题：

the synchronous flavor of federated optimization is potentially unscalable, inefficient, and inflexible.
1、设备过多会导致服务器端网络拥塞。

2、由于计算能力和电池时间有限，任务调度因设备而异，因此很难在每个时期结束时同步所选设备。

3、服务器必须确定一个超时阈值来丢弃掉队者。如果幸存设备的数量太少，服务器可能必须丢弃包含接收更新的整个时期。

解决：异步联邦优化

key idea：use a weighted average to update the global model
The mixing weight can also be set adaptively as a function of the staleness.
每当服务器收到本地模型时，它就会立即更新全局模型。