[人工智能] 读FL+DP程序学习记录1

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> 读FL+DP程序学习记录1 -> 正文阅读

[人工智能]读FL+DP程序学习记录1

    for i in range(beg, end):
        torch.cuda.manual_seed(0)  # for initialisation of the models
        use_cuda = torch.cuda.is_available()
        if use_cuda:
            print("Using GPU")
        print("---------------Running time:------------", i)

torch.manual_seed(args.seed) #为CPU设置种子用于生成随机数，以使得结果是确定的　　 torch.cuda.manual_seed(args.seed) #为当前GPU设置随机种子；　　
#如果使用多个GPU，应该使用torch.cuda.manual_seed_all()为所有的GPU设置种子

人话：神经网络都需要初始化参数，若每次都随机初始化，则结果没有可比性；所以用同样的随机初始化种子即可以保证每次初始化的结果都相同。

2 model
model = NN1_PCA(input_dim=60, output_dim=47), model

class NN1_PCA(nn.Module):
    def __init__(self, input_dim=60, output_dim=10):
        super(NN1_PCA, self).__init__()
        self.fc1 = nn.Linear(input_dim, 200)
        self.fc2 = nn.Linear(200, output_dim)

    def forward(self, x):
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

神经网络结构：60— 200—relu—47—softmax
linear：线性输出，无任何激活函数
forward：forward的作用主要是首先网络各个层的拼接，保证其能够按照一定的顺序来进行，同时可以在定义子网络时，使用nn.Sequential容器也可以保证网络的流动顺序。

3 add_hooks

        if model == "NN1_PCA":  # for Femnist, MNIST datasets
            model = NN1_PCA(input_dim=dim_pca, output_dim=dim_output), model
            add_hooks(model[0])

hook函数是程序中预定义好的函数，这个函数处于原有程序流程当中（暴露一个钩子出来）

我们需要再在有流程中钩子定义的函数块中实现某个具体的细节，需要把我们的实现，挂接或者注册（register）到钩子里，使得hook函数对目标可用

钩子只有注册的时候，才会使用，所以原有程序的流程中，没有注册或挂载时，执行的是空（即没有执行任何操作）

训练过程

class UserAVG(User):
    def __init__(self, numeric_id, train_data, test_data, model, sample_ratio, learning_rate, L, local_updates,
                 dp, times, use_cuda):
        super().__init__(numeric_id, train_data, test_data, model[0], sample_ratio, learning_rate, L,
                         local_updates, dp, times, use_cuda)

        if model[1] == 'mclr':
            self.loss = nn.NLLLoss()
        else:
            self.loss = nn.CrossEntropyLoss()
            # self.scheduler = StepLR(self.optimizer, step_size=50, gamma=0.1)
            # self.lr_drop_rate = 0.95

        param_groups = [{'params': p, 'lr': self.learning_rate} for p in self.model.parameters()]
        self.optimizer = FedAvgOptimizer(param_groups, lr=self.learning_rate, weight_decay=L)
        self.csi = None

    def set_grads(self, new_grads):
        if isinstance(new_grads, nn.Parameter):
            for model_grad, new_grad in zip(self.model.parameters(), new_grads):
                model_grad.data = new_grad.data
        elif isinstance(new_grads, list):
            for idx, model_grad in enumerate(self.model.parameters()):
                model_grad.data = new_grads[idx]

    def train_no_dp(self, glob_iter):
        """Training phase without differential privacy"""
        for epoch in range(1, self.local_updates + 1):
            self.model.train()

            # new batch (data sampling on every local epoch)
            np.random.seed(500 * (self.times + 1) * (glob_iter + 1) + epoch + 1)
            torch.manual_seed(500 * (self.times + 1) * (glob_iter + 1) + epoch + 1)
            train_idx = np.arange(self.train_samples)
            train_sampler = SubsetRandomSampler(train_idx)
            self.trainloader = DataLoader(self.train_data, self.batch_size, sampler=train_sampler)

            X, y = list(self.trainloader)[0]

            if self.use_cuda:
                X, y = X.cuda(), y.cuda()

            self.optimizer.zero_grad()
            clear_backprops(self.model)
            output = self.model(X)
            loss = self.loss(output, y)
            loss.backward()

            self.optimizer.step()

            if self.scheduler:
                self.scheduler.step()

        # get model difference
        for local, server, delta in zip(self.model.parameters(), self.server_model, self.delta_model):
            delta.data = local.data.detach() - server.data.detach()

        return loss

    def train_dp(self, sigma_g, glob_iter, max_norm):
        """Training phase under differential privacy"""

        for epoch in range(1, self.local_updates + 1):
            self.model.train()

            # new batch (data sampling on every local epoch)
            np.random.seed(500 * (self.times + 1) * (glob_iter + 1) + epoch + 1)
            torch.manual_seed(500 * (self.times + 1) * (glob_iter + 1) + epoch + 1)
            train_idx = np.arange(self.train_samples)
            train_sampler = SubsetRandomSampler(train_idx)
            self.trainloader = DataLoader(self.train_data, self.batch_size, sampler=train_sampler)

            X, y = list(self.trainloader)[0]

            if self.use_cuda:
                X, y = X.cuda(), y.cuda()

            self.optimizer.zero_grad()
            clear_backprops(self.model)
            output = self.model(X)
            loss = self.loss(output, y)
            loss.backward(retain_graph=True)
            compute_grad1(self.model)

            for p in self.model.parameters():
                # clipping single gradients

                # heuristic: otherwise, use max_norm constant
                max_norm = np.median([float(grad.data.norm(2)) for grad in p.grad1])

                p.grad1 = torch.stack(
                    [grad / max(1, float(grad.data.norm(2)) / max_norm) for grad in p.grad1])
                p.grad.data = torch.mean(p.grad1, dim=0)
                # DP mechanism
                p.grad.data = GaussianMechanism(p.grad.data, sigma_g, max_norm, self.batch_size, self.use_cuda)

            self.optimizer.step()

            if self.scheduler:
                self.scheduler.step()

        # get model difference
        for local, server, delta in zip(self.model.parameters(), self.server_model, self.delta_model):
            delta.data = local.data.detach() - server.data.detach()

        return 0

损失函数：交叉熵
self.loss = nn.CrossEntropyLoss()

class CrossEntropyLoss(_WeightedLoss):
   
    __constants__ = ['ignore_index', 'reduction']

    def __init__(self, weight=None, size_average=None, ignore_index=-100,
                 reduce=None, reduction='mean'):
        super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
        self.ignore_index = ignore_index

    def forward(self, input, target):
        return F.cross_entropy(input, target, weight=self.weight,
                               ignore_index=self.ignore_index, reduction=self.reduction)

It is useful when training a classification problem with `C` classes.   
If provided, the optional argument :attr:`weight` should be a 1D `Tensor` assigning weight to each of the classes.
This is particularly useful when you have an unbalanced training set.

输入预计是每一个类的原始的，未被归一化的数据

`input` has to be a Tensor of size. either :math:`(minibatch, C)` or
:math:`(minibatch, C, d_1, d_2, ..., d_K)`
with :math:`K \geq 1` for the `K`-dimensional case (described later).

This criterion expects a class index in the range :math:`[0, C-1]` as the
`target` for each value of a 1D tensor of size `minibatch`; if `ignore_index`
is specified, this criterion also accepts this class index (this index may not
necessarily be in the class range).

The loss can be described as:
.. math::
    \text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right)
                   = -x[class] + \log\left(\sum_j \exp(x[j])\right)

or in the case of the :attr:`weight` argument being specified:
.. math::

    \text{loss}(x, class) = weight[class] \left(-x[class] + \log\left(\sum_j \exp(x[j])\right)\right)

The losses are averaged across observations for each minibatch.对每个小批的观测值平均损失。

Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size :math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`,
where :math:`K` is the number of dimensions, and a target of appropriate shape
(see below).
Args:
    weight (Tensor, optional): a manual rescaling weight given to each class.
        If given, has to be a Tensor of size `C`
    size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,
        the losses are averaged over each loss element in the batch. Note that for
        some losses, there are multiple elements per sample. If the field :attr:`size_average`
        is set to ``False``, the losses are instead summed for each minibatch. Ignored
        when reduce is ``False``. Default: ``True``
    ignore_index (int, optional): Specifies a target value that is ignored
        and does not contribute to the input gradient. When :attr:`size_average` is
        ``True``, the loss is averaged over non-ignored targets.
    reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
        losses are averaged or summed over observations for each minibatch depending
        on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
        batch element instead and ignores :attr:`size_average`. Default: ``True``
    reduction (string, optional): Specifies the reduction to apply to the output:
        ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will be applied,
        ``'mean'``: the sum of the output will be divided by the number of
        elements in the output, ``'sum'``: the output will be summed. Note: :attr:`size_average`
        and :attr:`reduce` are in the process of being deprecated, and in the meantime,
        specifying either of those two args will override :attr:`reduction`. Default: ``'mean'``

Shape:
    - Input: :math:`(N, C)` where `C = number of classes`, or
      :math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`
      in the case of `K`-dimensional loss.
    - Target: :math:`(N)` where each value is :math:`0 \leq \text{targets}[i] \leq C-1`, or
      :math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case of
      K-dimensional loss.
    - Output: scalar.
      If :attr:`reduction` is ``'none'``, then the same size as the target:
      :math:`(N)`, or
      :math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case
      of K-dimensional loss.

Examples::

    >>> loss = nn.CrossEntropyLoss()
    >>> input = torch.randn(3, 5, requires_grad=True)
    >>> target = torch.empty(3, dtype=torch.long).random_(5)
    >>> output = loss(input, target)
    >>> output.backward()

优化器
self.optimizer = FedAvgOptimizer(param_groups, lr=self.learning_rate, weight_decay=L)

class FedAvgOptimizer(FedLOptimizer):
    def __init__(self, params, lr, weight_decay):
        super().__init__(params, lr, weight_decay)

    def step(self, closure=None):
        loss = None
        if closure is not None:
            loss = closure
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                p.data = p.data - p.grad.data * group['lr']
        return loss

训练过程
server.train()

 def train(self):
        loss = []
        for glob_iter in range(self.num_glob_iters):
            print("-------------Round number: ", glob_iter, " -------------")
            # loss_ = 0

            # Each user gets the global parameters
            self.send_parameters()    #用户从服务器设置他们的参数

            # Evaluate model at each iteration 在每次迭代中评估模型
            self.evaluate()

            # Users are selected
            if self.noise:
                self.selected_users = self.select_transmitting_users()
                print(f"Transmitting {len(self.selected_users)} users")
            else:
                self.selected_users = self.select_users(glob_iter, self.users_per_round)

            # Local updates
            for user in self.selected_users:
                if self.dp == "None":
                    user.train_no_dp(glob_iter)
                else:
                    user.train_dp(self.sigma_g, glob_iter, self.max_norm)
                user.drop_lr()

            # Aggregation

            self.aggregate_parameters()
            self.get_max_norm()

            if self.noise:
                self.apply_channel_effect()

        self.save_results()   # 保存loss(训练和测试)，accuracy(训练和测试)，dissimilarity(训练)到h5文件"
        self.save_norms()     # 保存规范到h5文件
        self.save_model()

迭代250次

 def train_no_dp(self, glob_iter):
        """Training phase without differential privacy"""
        for epoch in range(1, self.local_updates + 1):
            self.model.train()

            # new batch (data sampling on every local epoch)
            np.random.seed(500 * (self.times + 1) * (glob_iter + 1) + epoch + 1)
            torch.manual_seed(500 * (self.times + 1) * (glob_iter + 1) + epoch + 1)
            train_idx = np.arange(self.train_samples)
            train_sampler = SubsetRandomSampler(train_idx)
            self.trainloader = DataLoader(self.train_data, self.batch_size, sampler=train_sampler)

            X, y = list(self.trainloader)[0]

            if self.use_cuda:
                X, y = X.cuda(), y.cuda()

            self.optimizer.zero_grad()
            clear_backprops(self.model)
            output = self.model(X)
            loss = self.loss(output, y)
            loss.backward()

            self.optimizer.step()

            if self.scheduler:
                self.scheduler.step()

        # get model difference
        for local, server, delta in zip(self.model.parameters(), self.server_model, self.delta_model):
            delta.data = local.data.detach() - server.data.detach()

        return loss

迭代10次

results

Similarity: 1.0

Algorithm: FedAvg

-------------Round number: 0 -------------
Similarity: 1.0
Average Global Test Accuracy: 0.02385
Average Global Test Loss: 3.84941
Average Global Training Accuracy: 0.02289
Average Global F(x_t)-F(x*): 3.6905
Average Global Training Loss: 3.85023
Average Global Training Gradient Dissimilarity: 0.00117
Average Global Training Gradient Dissimilarity (mean of norms): 0.00421
Average Global Training Gradient Dissimilarity (norm of mean): 0.00304

-------------Round number: 249 -------------
Similarity: 1.0
Average Global Test Accuracy: 0.69255
Average Global Test Loss: 1.07392
Average Global Training Accuracy: 0.69969
Average Global F(x_t)-F(x*): 0.89725
Average Global Training Loss: 1.05698
Average Global Training Gradient Dissimilarity: 0.01182
Average Global Training Gradient Dissimilarity (mean of norms): 0.01604
Average Global Training Gradient Dissimilarity (norm of mean): 0.00421

Std Max Accuracy: 0.00037043517951487363
Mean Max Accuracy: 0.6947666666666666

====================================

Algorithm: SCAFFOLD

-------------Round number: 0 -------------
Similarity: 1.0
Average Global Test Accuracy: 0.0246
Average Global Test Loss: 3.84992
Average Global Training Accuracy: 0.02369
Average Global F(x_t)-F(x*): 3.69058
Average Global Training Loss: 3.85031
Average Global Training Gradient Dissimilarity: 0.00121
Average Global Training Gradient Dissimilarity (mean of norms): 0.00452
Average Global Training Gradient Dissimilarity (norm of mean): 0.00331

-------------Round number: 249 -------------
Similarity: 1.0
Average Global Test Accuracy: 0.6952
Average Global Test Loss: 1.068
Average Global Training Accuracy: 0.70214
Average Global F(x_t)-F(x*): 0.89042
Average Global Training Loss: 1.05015
Average Global Training Gradient Dissimilarity: 0.01158
Average Global Training Gradient Dissimilarity (mean of norms): 0.01338
Average Global Training Gradient Dissimilarity (norm of mean): 0.0018

Std Max Accuracy: 0.00037932688922471
Mean Max Accuracy: 0.6950333333333333

Similarity: 0.1

====================================

Algorithm: FedAvg

-------------Round number: 0 -------------
Similarity: 0.1
Average Global Test Accuracy: 0.02618
Average Global Test Loss: 3.85023
Average Global Training Accuracy: 0.02662
Average Global F(x_t)-F(x*): 3.75271
Average Global Training Loss: 3.84991
Average Global Training Gradient Dissimilarity: 1.54572
Average Global Training Gradient Dissimilarity (mean of norms): 1.55486
Average Global Training Gradient Dissimilarity (norm of mean): 0.00913

-------------Round number: 249 -------------
Similarity: 0.1
Average Global Test Accuracy: 0.57521
Average Global Test Loss: 1.4556
Average Global Training Accuracy: 0.57695
Average Global F(x_t)-F(x*): 1.35535
Average Global Training Loss: 1.45255
Average Global Training Gradient Dissimilarity: 6.6318
Average Global Training Gradient Dissimilarity (mean of norms): 6.72388
Average Global Training Gradient Dissimilarity (norm of mean): 0.09208

Std Max Accuracy: 0.003339922903454331
Mean Max Accuracy: 0.5943915018615296

================================================

Algorithm: SCAFFOLD

-------------Round number: 0 -------------
Similarity: 0.1
Average Global Test Accuracy: 0.3033
Average Global Test Loss: 3.84531
Average Global Training Accuracy: 0.03012
Average Global F(x_t)-F(x*): 3.7487
Average Global Training Loss: 3.8459
Average Global Training Gradient Dissimilarity: 1.51894
Average Global Training Gradient Dissimilarity (mean of norms): 1.52739
Average Global Training Gradient Dissimilarity (norm of mean): 0.00844

-------------Round number: 249 -------------
Similarity: 0.1
Average Global Test Accuracy: 0.62906
Average Global Test Loss: 1.30998
Average Global Training Accuracy: 0.63091
Average Global F(x_t)-F(x*): 1.21077
Average Global Training Loss: 1.30797
Average Global Training Gradient Dissimilarity: 19.33952
Average Global Training Gradient Dissimilarity (mean of norms): 19.44003
Average Global Training Gradient Dissimilarity (norm of mean): 0.10051

Std Max Accuracy: 0.002479936340467419
Mean Max Accuracy: 0.6366772551832497

====================================

Algorithm: FedAvg

-------------Round number: 0 -------------
Similarity: 0.0
Average Global Test Accuracy: 0.3094
Average Global Test Loss: 3.84465
Average Global Training Accuracy: 0.03056
Average Global F(x_t)-F(x*): 3.76351
Average Global Training Loss: 3.84495
Average Global Training Gradient Dissimilarity: 1.88936
Average Global Training Gradient Dissimilarity (mean of norms): 1.8992
Average Global Training Gradient Dissimilarity (norm of mean): 0.00984

-------------Round number: 249 -------------
Similarity: 0.0
Average Global Test Accuracy: 0.36521
Average Global Test Loss: 2.12814
Average Global Training Accuracy: 0.36927
Average Global F(x_t)-F(x*): 2.03785
Average Global Training Loss: 2.11928
Average Global Training Gradient Dissimilarity: 11.32929
Average Global Training Gradient Dissimilarity (mean of norms): 11.61684
Average Global Training Gradient Dissimilarity (norm of mean): 0.28755

========================================================

Algorithm: SCAFFOLD

-------------Round number: 0 -------------
Similarity: 0.0
Average Global Test Accuracy: 0.02005
Average Global Test Loss: 3.84892
Average Global Training Accuracy: 0.02082
Average Global F(x_t)-F(x*): 3.76703
Average Global Training Loss: 3.84847
Average Global Training Gradient Dissimilarity: 1.92658
Average Global Training Gradient Dissimilarity (mean of norms): 1.93722
Average Global Training Gradient Dissimilarity (norm of mean): 0.01065

-------------Round number: 249 -------------
Similarity: 0.0
Average Global Test Accuracy: 0.51411
Average Global Test Loss: 1.8632
Average Global Training Accuracy: 0.51596
Average Global F(x_t)-F(x*): 1.76778
Average Global Training Loss: 1.84922
Average Global Training Gradient Dissimilarity: 53.76546
Average Global Training Gradient Dissimilarity (mean of norms): 54.39637
Average Global Training Gradient Dissimilarity (norm of mean): 0.63091

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2022-04-07 22:41:41 更:2022-04-07 22:45:39

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/15 11:08:38-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码