   -> 人工智能 -> HBU-NNDL 实验五 前馈神经网络(2)自动梯度计算 & 优化问题 -> 正文阅读

[人工智能]HBU-NNDL 实验五 前馈神经网络(2)自动梯度计算 & 优化问题



4.3 自动梯度计算

4.3.1使用pytorch的预定义算子来重新实现二分类任务。 实现前馈神经网络模型完善Runner类模型训练


4.3.2?增加一个3个神经元的隐藏层,再次实现二分类,并与4.3.1做对比。 构建两个隐藏层神经网络模型

? 模型训练



4.4 优化问题

4.4.1 参数初始化

4.4.2 梯度消失问题 模型构建 使用Sigmoid型函数进行训练 使用ReLU函数进行模型训练

?4.4.3 死亡 ReLU 问题 使用ReLU进行模型训练 使用Leaky ReLU进行模型训练



4.3 自动梯度计算


????????autograd 包的第一个核心类是Tensor,如果将其属性 “.requires_grad” 设置为True,它将开始追踪(track)在其上的所有操作(意味着可以利用链式法则进行梯度传播)。完成计算后,调用函数“.backward()”完成所有梯度计算。此Tensor的体积将会累计到属性“.grad”中。

? ? ?【注意:在y.backward()时,如果y是标量,则不需要为backward()传入任何参数,否则,需要传入一个与y同形状的Tensor。】

4.3.1使用pytorch的预定义算子来重新实现二分类任务。 实现前馈神经网络模型


class Model_MLP_L2_V2(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model_MLP_L2_V2, self).__init__()
        # 使用'torch.nn.Linear'定义线性层。
        # 其中第一个参数(in_features)为线性层输入维度;第二个参数(out_features)为线性层输出维度
        # weight为权重参数属性,这里使用'torch.nn.init.normal_'进行随机高斯分布初始化
        # bias为偏置参数属性,这里使用'torch.nn.init.constant_'进行常量初始化
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)
        normal_(tensor=self.fc2.weight, mean=0., std=1.)
        constant_(tensor=self.fc2.bias, val=0.0)
        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act_fn = F.sigmoid

    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        return a2完善Runner类

基于上一节实现的?RunnerV2_1?类,本节的 RunnerV2_2 类在训练过程中使用自动梯度计算;模型保存时,使用state_dict方法获取模型参数;模型加载时,使用load_state_dict方法加载模型参数.

class RunnerV2_2(object):
    def __init__(self, model, optimizer, metric, loss_fn, **kwargs):
        self.model = model
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.metric = metric

        # 记录训练过程中的评估指标变化情况
        self.train_scores = []
        self.dev_scores = []

        # 记录训练过程中的评价指标变化情况
        self.train_loss = []
        self.dev_loss = []

    def train(self, train_set, dev_set, **kwargs):
        # 将模型切换为训练模式

        # 传入训练轮数,如果没有传入值则默认为0
        num_epochs = kwargs.get("num_epochs", 0)
        # 传入log打印频率,如果没有传入值则默认为100
        log_epochs = kwargs.get("log_epochs", 100)
        # 传入模型保存路径,如果没有传入值则默认为"best_model.pdparams"
        save_path = kwargs.get("save_path", "best_model.pdparams")

        # log打印函数,如果没有传入则默认为"None"
        custom_print_log = kwargs.get("custom_print_log", None)

        # 记录全局最优指标
        best_score = 0
        # 进行num_epochs轮训练
        for epoch in range(num_epochs):
            X, y = train_set
            # 获取模型预测
            logits = self.model(X)
            # 计算交叉熵损失
            trn_loss = self.loss_fn(logits, y)
            # 计算评估指标
            trn_score = self.metric(logits, y).item()

            # 自动计算参数梯度
            if custom_print_log is not None:
                # 打印每一层的梯度

            # 参数更新
            # 清空梯度

            dev_score, dev_loss = self.evaluate(dev_set)
            # 如果当前指标为最优指标,保存该模型
            if dev_score > best_score:
                print(f"[Evaluate] best accuracy performence has been updated: {best_score:.5f} --> {dev_score:.5f}")
                best_score = dev_score

            if log_epochs and epoch % log_epochs == 0:
                print(f"[Train] epoch: {epoch}/{num_epochs}, loss: {trn_loss.item()}")

    # 模型评估阶段,使用'torch.no_grad()'控制不计算和存储梯度
    def evaluate(self, data_set):
        # 将模型切换为评估模式

        X, y = data_set
        # 计算模型输出
        logits = self.model(X)
        # 计算损失函数
        loss = self.loss_fn(logits, y).item()
        # 计算评估指标
        score = self.metric(logits, y).item()
        return score, loss

    # 模型测试阶段,使用'torch.no_grad()'控制不计算和存储梯度
    def predict(self, X):
        # 将模型切换为评估模式
        return self.model(X)

    # 使用'model.state_dict()'获取模型参数,并进行保存
    def save_model(self, saved_path):, saved_path)

    # 使用'model.set_state_dict'加载模型参数
    def load_model(self, model_path):
        state_dict = torch.load(model_path)


from metric import accuracy
from dataset import make_moons

n_samples = 1000
X, y = make_moons(n_samples=n_samples, shuffle=True, noise=0.15)

num_train = 640
num_dev = 160
num_test = 200

X_train, y_train = X[:num_train], y[:num_train]
X_dev, y_dev = X[num_train:num_train + num_dev], y[num_train:num_train + num_dev]
X_test, y_test = X[num_train + num_dev:], y[num_train + num_dev:]

y_train = y_train.reshape([-1,1])
y_dev = y_dev.reshape([-1,1])
y_test = y_test.reshape([-1,1])

# 设置模型
input_size = 2
hidden_size = 5
output_size = 1
model = Model_MLP_L2_V2(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# 设置损失函数
loss_fn = F.binary_cross_entropy

# 设置优化器
learning_rate = 0.2
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)

# 设置评价指标
metric = accuracy

# 其他参数
epoch_num = 1000
saved_path = 'best_model.pdparams'

# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=epoch_num, log_epochs=50, save_path="best_model.pdparams")

[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.55000
[Train] epoch: 0/1000, loss: 0.672812819480896
[Evaluate] best accuracy performence has been updated: 0.55000 --> 0.56250
[Evaluate] best accuracy performence has been updated: 0.56250 --> 0.57500
[Evaluate] best accuracy performence has been updated: 0.57500 --> 0.58125
[Evaluate] best accuracy performence has been updated: 0.58125 --> 0.59375
[Evaluate] best accuracy performence has been updated: 0.59375 --> 0.60000
[Evaluate] best accuracy performence has been updated: 0.60000 --> 0.61250
[Evaluate] best accuracy performence has been updated: 0.61250 --> 0.61875
[Evaluate] best accuracy performence has been updated: 0.61875 --> 0.62500
[Evaluate] best accuracy performence has been updated: 0.62500 --> 0.63750
[Evaluate] best accuracy performence has been updated: 0.63750 --> 0.64375
[Evaluate] best accuracy performence has been updated: 0.64375 --> 0.65000
[Evaluate] best accuracy performence has been updated: 0.65000 --> 0.65625
[Evaluate] best accuracy performence has been updated: 0.65625 --> 0.66250
[Evaluate] best accuracy performence has been updated: 0.66250 --> 0.67500
[Evaluate] best accuracy performence has been updated: 0.67500 --> 0.68125
[Evaluate] best accuracy performence has been updated: 0.68125 --> 0.68750
[Evaluate] best accuracy performence has been updated: 0.68750 --> 0.69375
[Evaluate] best accuracy performence has been updated: 0.69375 --> 0.70000
[Evaluate] best accuracy performence has been updated: 0.70000 --> 0.70625
[Evaluate] best accuracy performence has been updated: 0.70625 --> 0.71250
[Evaluate] best accuracy performence has been updated: 0.71250 --> 0.71875
[Evaluate] best accuracy performence has been updated: 0.71875 --> 0.72500
[Evaluate] best accuracy performence has been updated: 0.72500 --> 0.73125
[Evaluate] best accuracy performence has been updated: 0.73125 --> 0.73750
[Train] epoch: 50/1000, loss: 0.5244487524032593
[Evaluate] best accuracy performence has been updated: 0.73750 --> 0.74375
[Evaluate] best accuracy performence has been updated: 0.74375 --> 0.75000
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.75625
[Evaluate] best accuracy performence has been updated: 0.75625 --> 0.76250
[Evaluate] best accuracy performence has been updated: 0.76250 --> 0.76875
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.77500
[Evaluate] best accuracy performence has been updated: 0.77500 --> 0.78125
[Train] epoch: 100/1000, loss: 0.44568243622779846
[Evaluate] best accuracy performence has been updated: 0.78125 --> 0.78750
[Evaluate] best accuracy performence has been updated: 0.78750 --> 0.79375
[Evaluate] best accuracy performence has been updated: 0.79375 --> 0.80000
[Evaluate] best accuracy performence has been updated: 0.80000 --> 0.80625
[Evaluate] best accuracy performence has been updated: 0.80625 --> 0.81250
[Evaluate] best accuracy performence has been updated: 0.81250 --> 0.81875
[Train] epoch: 150/1000, loss: 0.39583656191825867
[Evaluate] best accuracy performence has been updated: 0.81875 --> 0.82500
[Evaluate] best accuracy performence has been updated: 0.82500 --> 0.83125
[Evaluate] best accuracy performence has been updated: 0.83125 --> 0.83750
[Evaluate] best accuracy performence has been updated: 0.83750 --> 0.84375
[Evaluate] best accuracy performence has been updated: 0.84375 --> 0.85000
[Train] epoch: 200/1000, loss: 0.3631165325641632
[Evaluate] best accuracy performence has been updated: 0.85000 --> 0.85625
[Evaluate] best accuracy performence has been updated: 0.85625 --> 0.86250
[Evaluate] best accuracy performence has been updated: 0.86250 --> 0.86875
[Train] epoch: 250/1000, loss: 0.34118399024009705
[Evaluate] best accuracy performence has been updated: 0.86875 --> 0.87500
[Evaluate] best accuracy performence has been updated: 0.87500 --> 0.88125
[Evaluate] best accuracy performence has been updated: 0.88125 --> 0.88750
[Train] epoch: 300/1000, loss: 0.3262239098548889
[Train] epoch: 350/1000, loss: 0.31585538387298584
[Train] epoch: 400/1000, loss: 0.3085569739341736
[Train] epoch: 450/1000, loss: 0.3033401072025299
[Train] epoch: 500/1000, loss: 0.2995539605617523
[Train] epoch: 550/1000, loss: 0.2967640161514282
[Train] epoch: 600/1000, loss: 0.29467660188674927
[Train] epoch: 650/1000, loss: 0.293090283870697
[Train] epoch: 700/1000, loss: 0.2918652892112732
[Train] epoch: 750/1000, loss: 0.29090332984924316
[Train] epoch: 800/1000, loss: 0.29013437032699585
[Train] epoch: 850/1000, loss: 0.28950807452201843
[Train] epoch: 900/1000, loss: 0.2889878451824188
[Train] epoch: 950/1000, loss: 0.2885465621948242


import matplotlib.pyplot as plt

# 可视化观察训练集与验证集的指标变化情况
def plot(runner, fig_name):
    plt.figure(figsize=(10, 5))
    epochs = [i for i in range(len(runner.train_scores))]

    plt.subplot(1, 2, 1)
    plt.plot(epochs, runner.train_loss, color='#e4007f', label="Train loss")
    plt.plot(epochs, runner.dev_loss, color='#f19ec2', linestyle='--', label="Dev loss")
    # 绘制坐标轴和图例
    plt.ylabel("loss", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='upper right', fontsize='x-large')

    plt.subplot(1, 2, 2)
    plt.plot(epochs, runner.train_scores, color='#e4007f', label="Train accuracy")
    plt.plot(epochs, runner.dev_scores, color='#f19ec2', linestyle='--', label="Dev accuracy")
    # 绘制坐标轴和图例
    plt.ylabel("score", fontsize='large')
    plt.xlabel("epoch", fontsize='large')
    plt.legend(loc='lower right', fontsize='x-large')


plot(runner, 'fw-acc.pdf')



# 模型评价
score, loss = runner.evaluate([X_test, y_test])
print("[Test] score/loss: {:.4f}/{:.4f}".format(score, loss))

[Test] score/loss: 0.8850/0.2583

4.3.2?增加一个3个神经元的隐藏层,再次实现二分类,并与4.3.1做对比。 构建两个隐藏层神经网络模型

class Model_MLP_L5(torch.nn.Module):
    def __init__(self, input_size, output_size,mean_init=0.,std_init=1.,b_init=0.0):
        super(Model_MLP_L5, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 3)
        normal_(tensor=self.fc1.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc1.bias, val=b_init)
        self.fc2 = torch.nn.Linear(3, 3)
        normal_(tensor=self.fc2.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc2.bias, val=b_init)
        self.fc3 = torch.nn.Linear(3, output_size)
        normal_(tensor=self.fc3.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc3.bias, val=b_init)
        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act = F.sigmoid

    # 前向计算
    def forward(self, inputs):
        outputs = self.fc1(inputs)
        outputs = self.act(outputs)
        outputs = self.fc2(outputs)
        outputs = self.act(outputs)
        outputs = self.fc3(outputs)
        outputs = self.act(outputs)
        return outputs

? 模型训练

# 设置模型
input_size = 2
output_size = 1
model = Model_MLP_L5(input_size=input_size, output_size=output_size)

# 设置损失函数
loss_fn = F.binary_cross_entropy

# 设置优化器
learning_rate = 0.2
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)

# 设置评价指标
metric = accuracy

# 其他参数
epoch_num = 1000
saved_path = 'best_model.pdparams'

# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=epoch_num, log_epochs=50, save_path="best_model.pdparams")

[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.51875
[Train] epoch: 0/1000, loss: 0.89256751537323
[Evaluate] best accuracy performence has been updated: 0.51875 --> 0.55000
[Evaluate] best accuracy performence has been updated: 0.55000 --> 0.56250
[Evaluate] best accuracy performence has been updated: 0.56250 --> 0.57500
[Evaluate] best accuracy performence has been updated: 0.57500 --> 0.61875
[Evaluate] best accuracy performence has been updated: 0.61875 --> 0.64375
[Evaluate] best accuracy performence has been updated: 0.64375 --> 0.65000
[Evaluate] best accuracy performence has been updated: 0.65000 --> 0.65625
[Evaluate] best accuracy performence has been updated: 0.65625 --> 0.68125
[Evaluate] best accuracy performence has been updated: 0.68125 --> 0.68750
[Evaluate] best accuracy performence has been updated: 0.68750 --> 0.70625
[Evaluate] best accuracy performence has been updated: 0.70625 --> 0.72500
[Train] epoch: 50/1000, loss: 0.6718729734420776
[Evaluate] best accuracy performence has been updated: 0.72500 --> 0.73125
[Evaluate] best accuracy performence has been updated: 0.73125 --> 0.73750
[Evaluate] best accuracy performence has been updated: 0.73750 --> 0.74375
[Evaluate] best accuracy performence has been updated: 0.74375 --> 0.75000
[Train] epoch: 100/1000, loss: 0.6418899297714233
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.75625
[Train] epoch: 150/1000, loss: 0.6082024574279785
[Evaluate] best accuracy performence has been updated: 0.75625 --> 0.76250
[Evaluate] best accuracy performence has been updated: 0.76250 --> 0.76875
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.77500
[Train] epoch: 200/1000, loss: 0.5676448941230774
[Evaluate] best accuracy performence has been updated: 0.77500 --> 0.78125
[Evaluate] best accuracy performence has been updated: 0.78125 --> 0.78750
[Evaluate] best accuracy performence has been updated: 0.78750 --> 0.79375
[Train] epoch: 250/1000, loss: 0.5215781927108765
[Evaluate] best accuracy performence has been updated: 0.79375 --> 0.80000
[Evaluate] best accuracy performence has been updated: 0.80000 --> 0.80625
[Evaluate] best accuracy performence has been updated: 0.80625 --> 0.81250
[Train] epoch: 300/1000, loss: 0.47454261779785156
[Evaluate] best accuracy performence has been updated: 0.81250 --> 0.81875
[Evaluate] best accuracy performence has been updated: 0.81875 --> 0.82500
[Evaluate] best accuracy performence has been updated: 0.82500 --> 0.83125
[Evaluate] best accuracy performence has been updated: 0.83125 --> 0.83750
[Evaluate] best accuracy performence has been updated: 0.83750 --> 0.85000
[Evaluate] best accuracy performence has been updated: 0.85000 --> 0.85625
[Train] epoch: 350/1000, loss: 0.43142348527908325
[Evaluate] best accuracy performence has been updated: 0.85625 --> 0.86250
[Evaluate] best accuracy performence has been updated: 0.86250 --> 0.86875
[Train] epoch: 400/1000, loss: 0.39497217535972595
[Evaluate] best accuracy performence has been updated: 0.86875 --> 0.87500
[Evaluate] best accuracy performence has been updated: 0.87500 --> 0.88125
[Train] epoch: 450/1000, loss: 0.3656612038612366
[Evaluate] best accuracy performence has been updated: 0.88125 --> 0.88750
[Evaluate] best accuracy performence has been updated: 0.88750 --> 0.89375
[Train] epoch: 500/1000, loss: 0.3428545594215393
[Train] epoch: 550/1000, loss: 0.3256698548793793
[Train] epoch: 600/1000, loss: 0.3132207989692688
[Train] epoch: 650/1000, loss: 0.30456796288490295
[Evaluate] best accuracy performence has been updated: 0.89375 --> 0.90000
[Evaluate] best accuracy performence has been updated: 0.90000 --> 0.90625
[Train] epoch: 700/1000, loss: 0.2987373173236847
[Evaluate] best accuracy performence has been updated: 0.90625 --> 0.91250
[Train] epoch: 750/1000, loss: 0.294859379529953
[Train] epoch: 800/1000, loss: 0.292269766330719
[Evaluate] best accuracy performence has been updated: 0.91250 --> 0.91875
[Train] epoch: 850/1000, loss: 0.29051074385643005
[Train] epoch: 900/1000, loss: 0.2892831265926361
[Train] epoch: 950/1000, loss: 0.28839582204818726

[Test] score/loss: 0.8800/0.2849




class Model_MLP_L5(torch.nn.Module):
    def __init__(self, input_size, output_size,mean_init=0.,std_init=1.,b_init=0.0):
        super(Model_MLP_L5, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 5)
        normal_(tensor=self.fc1.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc1.bias, val=b_init)
        self.fc2 = torch.nn.Linear(5, 5)
        normal_(tensor=self.fc2.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc2.bias, val=b_init)
        self.fc3 = torch.nn.Linear(5, 5)
        normal_(tensor=self.fc3.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc3.bias, val=b_init)
        self.fc4 = torch.nn.Linear(5, 5)
        normal_(tensor=self.fc4.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc4.bias, val=b_init)
        self.fc5 = torch.nn.Linear(5, output_size)
        normal_(tensor=self.fc5.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc5.bias, val=b_init)
        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act = F.sigmoid

    # 前向计算
    def forward(self, inputs):
        outputs = self.fc1(inputs)
        outputs = self.act(outputs)
        outputs = self.fc2(outputs)
        outputs = self.act(outputs)
        outputs = self.fc3(outputs)
        outputs = self.act(outputs)
        outputs = self.fc4(outputs)
        outputs = self.act(outputs)
        outputs = self.fc5(outputs)
        outputs = F.sigmoid(outputs)
        return outputs

??[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.54375
[Train] epoch: 0/1000, loss: 0.8435155153274536
[Evaluate] best accuracy performence has been updated: 0.54375 --> 0.55625
[Evaluate] best accuracy performence has been updated: 0.55625 --> 0.59375
[Evaluate] best accuracy performence has been updated: 0.59375 --> 0.63750
[Evaluate] best accuracy performence has been updated: 0.63750 --> 0.66875
[Evaluate] best accuracy performence has been updated: 0.66875 --> 0.68750
[Evaluate] best accuracy performence has been updated: 0.68750 --> 0.71250
[Evaluate] best accuracy performence has been updated: 0.71250 --> 0.72500
[Train] epoch: 50/1000, loss: 0.6921094655990601
[Train] epoch: 100/1000, loss: 0.6882262229919434
[Evaluate] best accuracy performence has been updated: 0.72500 --> 0.73125
[Evaluate] best accuracy performence has been updated: 0.73125 --> 0.73750
[Evaluate] best accuracy performence has been updated: 0.73750 --> 0.74375
[Train] epoch: 150/1000, loss: 0.6829259395599365
[Train] epoch: 200/1000, loss: 0.6751701831817627
[Evaluate] best accuracy performence has been updated: 0.74375 --> 0.75000
[Train] epoch: 250/1000, loss: 0.6634606122970581
[Evaluate] best accuracy performence has been updated: 0.75000 --> 0.75625
[Evaluate] best accuracy performence has been updated: 0.75625 --> 0.76250
[Evaluate] best accuracy performence has been updated: 0.76250 --> 0.76875
[Train] epoch: 300/1000, loss: 0.6455010175704956
[Evaluate] best accuracy performence has been updated: 0.76875 --> 0.77500
[Train] epoch: 350/1000, loss: 0.6183261871337891
[Train] epoch: 400/1000, loss: 0.5804553031921387
[Evaluate] best accuracy performence has been updated: 0.77500 --> 0.78125
[Evaluate] best accuracy performence has been updated: 0.78125 --> 0.78750
[Evaluate] best accuracy performence has been updated: 0.78750 --> 0.79375
[Train] epoch: 450/1000, loss: 0.5352076888084412
[Evaluate] best accuracy performence has been updated: 0.79375 --> 0.80000
[Evaluate] best accuracy performence has been updated: 0.80000 --> 0.80625
[Evaluate] best accuracy performence has been updated: 0.80625 --> 0.81250
[Train] epoch: 500/1000, loss: 0.4886578917503357
[Evaluate] best accuracy performence has been updated: 0.81250 --> 0.81875
[Evaluate] best accuracy performence has been updated: 0.81875 --> 0.82500
[Train] epoch: 550/1000, loss: 0.44431859254837036
[Evaluate] best accuracy performence has been updated: 0.82500 --> 0.83125
[Evaluate] best accuracy performence has been updated: 0.83125 --> 0.83750
[Evaluate] best accuracy performence has been updated: 0.83750 --> 0.84375
[Evaluate] best accuracy performence has been updated: 0.84375 --> 0.85000
[Train] epoch: 600/1000, loss: 0.40396183729171753
[Evaluate] best accuracy performence has been updated: 0.85000 --> 0.85625
[Evaluate] best accuracy performence has been updated: 0.85625 --> 0.86250
[Evaluate] best accuracy performence has been updated: 0.86250 --> 0.87500
[Train] epoch: 650/1000, loss: 0.3699793219566345
[Evaluate] best accuracy performence has been updated: 0.87500 --> 0.88125
[Train] epoch: 700/1000, loss: 0.3445335030555725
[Evaluate] best accuracy performence has been updated: 0.88125 --> 0.88750
[Evaluate] best accuracy performence has been updated: 0.88750 --> 0.89375
[Train] epoch: 750/1000, loss: 0.32763513922691345
[Evaluate] best accuracy performence has been updated: 0.89375 --> 0.90000
[Train] epoch: 800/1000, loss: 0.31706300377845764
[Train] epoch: 850/1000, loss: 0.3103279769420624
[Train] epoch: 900/1000, loss: 0.30574506521224976
[Train] epoch: 950/1000, loss: 0.3023308515548706
[Evaluate] best accuracy performence has been updated: 0.90000 --> 0.90625

[Test] score/loss: 0.8700/0.3286


[Test] score/loss: 0.8600/0.4399


[Test] score/loss: 0.4950/0.7104




[Test] score/loss: 0.9000/0.2586


[Test] score/loss: 0.8500/0.5921


[Test] score/loss: 0.8650/0.3518







  • 隐藏神经元的数量应在输入层的大小和输出层的大小之间。
  • 隐藏神经元的数量应为输入层大小的2/3加上输出层大小的2/3。
  • 隐藏神经元的数量应小于输入层大小的两倍。





[Test] score/loss: 0.8050/0.4815


[Test] score/loss: 0.8850/0.2583


4.4 优化问题


4.4.1 参数初始化



import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.init import constant_, normal_

class Model_MLP_L2_V4(torch.nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Model_MLP_L2_V4, self).__init__()
        # 使用'torch.nn.Linear'定义线性层。
        # 其中第一个参数(in_features)为线性层输入维度;第二个参数(out_features)为线性层输出维度
        # weight为权重参数属性,bias为偏置参数属性,这里使用'torch.nn.init.constant_'进行常量初始化
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, output_size)
        constant_(tensor=self.fc2.weight, val=0.0)
        constant_(tensor=self.fc2.bias, val=0.0)
        # 使用'torch.nn.functional.sigmoid'定义 Logistic 激活函数
        self.act_fn = F.sigmoid

    # 前向计算
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        a1 = self.act_fn(z1)
        z2 = self.fc2(a1)
        a2 = self.act_fn(z2)
        return a2

def print_weights(runner):
    print('The weights of the Layers:')

    for _, param in enumerate(runner.model.named_parameters()):


from metric import accuracy
from dataset import make_moons
n_samples = 1000
X, y = make_moons(n_samples=n_samples, shuffle=True, noise=0.15)

num_train = 640
num_dev = 160
num_test = 200

X_train, y_train = X[:num_train], y[:num_train]
X_dev, y_dev = X[num_train:num_train + num_dev], y[num_train:num_train + num_dev]
X_test, y_test = X[num_train + num_dev:], y[num_train + num_dev:]

y_train = y_train.reshape([-1,1])
y_dev = y_dev.reshape([-1,1])
y_test = y_test.reshape([-1,1])

# 设置模型
input_size = 2
hidden_size = 5
output_size = 1
model = Model_MLP_L2_V4(input_size=input_size, hidden_size=hidden_size, output_size=output_size)

# 设置损失函数
loss_fn = F.binary_cross_entropy

# 设置优化器
learning_rate = 0.2 #5e-2
optimizer = torch.optim.SGD(model.parameters(),lr=learning_rate)

# 设置评价指标
metric = accuracy

# 其他参数
epoch = 2000
saved_path = 'best_model.pdparams'

# 实例化RunnerV2类,并传入训练配置
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

runner.train([X_train, y_train], [X_dev, y_dev], num_epochs=5, log_epochs=50, save_path="best_model.pdparams",custom_print_log=print_weights)

The weights of the Layers:
('fc1.weight', Parameter containing:
tensor([[-4.1772e-05, ?3.4384e-05],
? ? ? ? [-4.1772e-05, ?3.4384e-05],
? ? ? ? [-4.1772e-05, ?3.4384e-05],
? ? ? ? [-4.1772e-05, ?3.4384e-05],
? ? ? ? [-4.1772e-05, ?3.4384e-05]], requires_grad=True))
('fc1.bias', Parameter containing:
tensor([8.2898e-07, 8.2898e-07, 8.2898e-07, 8.2898e-07, 8.2898e-07],
? ? ? ?requires_grad=True))
('fc2.weight', Parameter containing:
tensor([[-0.0021, -0.0021, -0.0021, -0.0021, -0.0021]], requires_grad=True))
('fc2.bias', Parameter containing:
tensor([-0.0042], requires_grad=True))


plot(runner, "fw-zero.pdf")



4.4.2 梯度消失问题



下面通过一个简单的实验观察前馈神经网络的梯度消失现象和改进方法。 模型构建

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.init import constant_, normal_

# 定义多层前馈神经网络
class Model_MLP_L5(torch.nn.Module):
    def __init__(self, input_size, output_size, act='relu',mean_init=0.,std_init=0.01,b_init=1.0):
        super(Model_MLP_L5, self).__init__()
        self.fc1 = torch.nn.Linear(input_size, 3)
        normal_(tensor=self.fc1.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc1.bias, val=b_init)
        self.fc2 = torch.nn.Linear(3, 3)
        normal_(tensor=self.fc2.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc2.bias, val=b_init)
        self.fc3 = torch.nn.Linear(3, 3)
        normal_(tensor=self.fc3.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc3.bias, val=b_init)
        self.fc4 = torch.nn.Linear(3, 3)
        normal_(tensor=self.fc4.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc4.bias, val=b_init)
        self.fc5 = torch.nn.Linear(3, output_size)
        normal_(tensor=self.fc5.weight, mean=mean_init, std=std_init)
        constant_(tensor=self.fc5.bias, val=b_init)
        # 定义网络使用的激活函数
        if act == 'sigmoid':
            self.act = F.sigmoid
        elif act == 'relu':
            self.act = F.relu
        elif act == 'lrelu':
            self.act = F.leaky_relu
            raise ValueError("Please enter sigmoid relu or lrelu!")

    def forward(self, inputs):
        outputs = self.fc1(
        outputs = self.act(outputs)
        outputs = self.fc2(outputs)
        outputs = self.act(outputs)
        outputs = self.fc3(outputs)
        outputs = self.act(outputs)
        outputs = self.fc4(outputs)
        outputs = self.act(outputs)
        outputs = self.fc5(outputs)
        outputs = F.sigmoid(outputs)
        return outputs 使用Sigmoid型函数进行训练



def print_grads(runner):
    print('The grad of the Layers:')

    for name, parms in runner.model.named_parameters():
        print('-->name:', name, ' -->grad_value:', parms.grad)
# 学习率大小
lr = 0.01

# 定义网络,激活函数使用sigmoid
model =  Model_MLP_L5(input_size=2, output_size=1, act='sigmoid')

# 定义优化器
optimizer = torch.optim.SGD(model.parameters(),lr=lr)

# 定义损失函数,使用交叉熵损失函数
loss_fn = F.binary_cross_entropy

from metric import accuracy

# 定义评价指标
metric = accuracy

# 指定梯度打印函数


# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)


# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev],
            num_epochs=1, log_epochs=None,

The grad of the Layers:
-->name: fc1.weight ?-->grad_value: tensor([[-4.3984e-12, ?8.6800e-12],
? ? ? ? [ 3.3543e-12, -6.6174e-12],
? ? ? ? [ 5.9331e-12, -1.1691e-11]])
-->name: fc1.bias ?-->grad_value: tensor([ 8.6175e-12, -6.5863e-12, -1.1602e-11])
-->name: fc2.weight ?-->grad_value: tensor([[1.2173e-09, 1.2156e-09, 1.2185e-09],
? ? ? ? [1.8435e-09, 1.8409e-09, 1.8453e-09],
? ? ? ? [4.3710e-10, 4.3650e-10, 4.3754e-10]])
-->name: fc2.bias ?-->grad_value: tensor([1.6666e-09, 2.5239e-09, 5.9843e-10])
-->name: fc3.weight ?-->grad_value: tensor([[-1.4786e-06, -1.4787e-06, -1.4688e-06],
? ? ? ? [-1.7528e-06, -1.7530e-06, -1.7412e-06],
? ? ? ? [ 1.7010e-06, ?1.7012e-06, ?1.6898e-06]])
-->name: fc3.bias ?-->grad_value: tensor([-2.0250e-06, -2.4006e-06, ?2.3296e-06])
-->name: fc4.weight ?-->grad_value: tensor([[-0.0001, -0.0001, -0.0001],
? ? ? ? [ 0.0007, ?0.0007, ?0.0007],
? ? ? ? [ 0.0004, ?0.0004, ?0.0004]])
-->name: fc4.bias ?-->grad_value: tensor([-0.0001, ?0.0010, ?0.0006])
-->name: fc5.weight ?-->grad_value: tensor([[0.1861, 0.1863, 0.1867]])
-->name: fc5.bias ?-->grad_value: tensor([0.2555])
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.55000

观察实验结果可以发现,梯度经过每一个神经层的传递都会不断衰减,最终传递到第一个神经层时,梯度几乎完全消失。 使用ReLU函数进行模型训练

# 学习率大小
lr = 0.01

# 定义网络,激活函数使用sigmoid
model =  Model_MLP_L5(input_size=2, output_size=1, act='relu')

# 定义优化器
optimizer = torch.optim.SGD(model.parameters(),lr=lr)

# 定义损失函数,使用交叉熵损失函数
loss_fn = F.binary_cross_entropy

from metric import accuracy

# 定义评价指标
metric = accuracy

# 指定梯度打印函数

# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

runner.train([X_train, y_train], [X_dev, y_dev],
            num_epochs=1, log_epochs=None,

The grad of the Layers:
-->name: fc1.weight ?-->grad_value: tensor([[-2.9280e-09, ?5.8138e-09],
? ? ? ? [ 2.1988e-09, -4.3659e-09],
? ? ? ? [ 3.9221e-09, -7.7876e-09]])
-->name: fc1.bias ?-->grad_value: tensor([ 5.7912e-09, -4.3489e-09, -7.7572e-09])
-->name: fc2.weight ?-->grad_value: tensor([[2.1852e-07, 2.1740e-07, 2.1933e-07],
? ? ? ? [3.3131e-07, 3.2962e-07, 3.3254e-07],
? ? ? ? [7.6353e-08, 7.5963e-08, 7.6636e-08]])
-->name: fc2.bias ?-->grad_value: tensor([2.1923e-07, 3.3240e-07, 7.6604e-08])
-->name: fc3.weight ?-->grad_value: tensor([[-5.2026e-05, -5.2053e-05, -5.0292e-05],
? ? ? ? [-6.1891e-05, -6.1923e-05, -5.9828e-05],
? ? ? ? [ 6.0128e-05, ?6.0158e-05, ?5.8123e-05]])
-->name: fc3.bias ?-->grad_value: tensor([-5.2354e-05, -6.2281e-05, ?6.0506e-05])
-->name: fc4.weight ?-->grad_value: tensor([[-0.0007, -0.0007, -0.0007],
? ? ? ? [ 0.0051, ?0.0052, ?0.0052],
? ? ? ? [ 0.0029, ?0.0029, ?0.0029]])
-->name: fc4.bias ?-->grad_value: tensor([-0.0007, ?0.0052, ?0.0029])
-->name: fc5.weight ?-->grad_value: tensor([[0.2526, 0.2539, 0.2567]])
-->name: fc5.bias ?-->grad_value: tensor([0.2570])
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.55000


?4.4.3 死亡 ReLU 问题

ReLU激活函数可以一定程度上改善梯度消失问题,但是ReLU函数在某些情况下容易出现死亡 ReLU问题,使得网络难以训练。这是由于当x<0x时,ReLU函数的输出恒为0。在训练过程中,如果参数在一次不恰当的更新后,某个ReLU神经元在所有训练数据上都不能被激活(即输出为0),那么这个神经元自身参数的梯度永远都会是0,在以后的训练过程中永远都不能被激活。而一种简单有效的优化方式就是将激活函数更换为Leaky ReLU、ELU等ReLU的变种。 使用ReLU进行模型训练


# 定义网络,并使用较大的负值来初始化偏置
model =  Model_MLP_L5(input_size=2, output_size=1, act='relu', b_init=-8.0)


# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev], 
            num_epochs=1, log_epochs=0, 

The grad of the Layers:
-->name: fc1.weight ?-->grad_value: tensor([[0., 0.],
? ? ? ? [0., 0.],
? ? ? ? [0., 0.]])
-->name: fc1.bias ?-->grad_value: tensor([0., 0., 0.])
-->name: fc2.weight ?-->grad_value: tensor([[0., 0., 0.],
? ? ? ? [0., 0., 0.],
? ? ? ? [0., 0., 0.]])
-->name: fc2.bias ?-->grad_value: tensor([0., 0., 0.])
-->name: fc3.weight ?-->grad_value: tensor([[0., 0., 0.],
? ? ? ? [0., 0., 0.],
? ? ? ? [0., 0., 0.]])
-->name: fc3.bias ?-->grad_value: tensor([0., 0., 0.])
-->name: fc4.weight ?-->grad_value: tensor([[0., 0., 0.],
? ? ? ? [0., 0., 0.],
? ? ? ? [0., 0., 0.]])
-->name: fc4.bias ?-->grad_value: tensor([0., 0., 0.])
-->name: fc5.weight ?-->grad_value: tensor([[0., 0., 0.]])
-->name: fc5.bias ?-->grad_value: tensor([-0.4794])
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.45000

从输出结果可以发现,使用 ReLU 作为激活函数,当满足条件时,会发生死亡ReLU问题,网络训练过程中 ReLU 神经元的梯度始终为0,参数无法更新。

针对死亡ReLU问题,一种简单有效的优化方式就是将激活函数更换为Leaky ReLU、ELU等ReLU 的变种。接下来,观察将激活函数更换为 Leaky ReLU时的梯度情况。 使用Leaky ReLU进行模型训练

# 重新定义网络,使用Leaky ReLU激活函数
model =  Model_MLP_L5(input_size=2, output_size=1, act='lrelu', b_init=-8.0)

# 实例化Runner类
runner = RunnerV2_2(model, optimizer, metric, loss_fn)

# 启动训练
runner.train([X_train, y_train], [X_dev, y_dev], 
            num_epochs=1, log_epochps=None, 

The grad of the Layers:
-->name: fc1.weight ?-->grad_value: tensor([[-1.0685e-16, ?1.4224e-17],
? ? ? ? [ 8.0243e-17, -1.0681e-17],
? ? ? ? [ 1.4313e-16, -1.9052e-17]])
-->name: fc1.bias ?-->grad_value: tensor([-1.0803e-16, ?8.1126e-17, ?1.4471e-16])
-->name: fc2.weight ?-->grad_value: tensor([[3.2707e-14, 3.2668e-14, 3.2706e-14],
? ? ? ? [4.9589e-14, 4.9531e-14, 4.9589e-14],
? ? ? ? [1.1428e-14, 1.1415e-14, 1.1428e-14]])
-->name: fc2.bias ?-->grad_value: tensor([-4.0897e-13, -6.2007e-13, -1.4290e-13])
-->name: fc3.weight ?-->grad_value: tensor([[-7.8125e-10, -7.8125e-10, -7.8099e-10],
? ? ? ? [-9.2938e-10, -9.2939e-10, -9.2907e-10],
? ? ? ? [ 9.0290e-10, ?9.0291e-10, ?9.0260e-10]])
-->name: fc3.bias ?-->grad_value: tensor([ 9.7662e-09, ?1.1618e-08, -1.1287e-08])
-->name: fc4.weight ?-->grad_value: tensor([[-1.0404e-06, -1.0405e-06, -1.0405e-06],
? ? ? ? [ 7.7293e-06, ?7.7304e-06, ?7.7305e-06],
? ? ? ? [ 4.3782e-06, ?4.3788e-06, ?4.3789e-06]])
-->name: fc4.bias ?-->grad_value: tensor([ 1.3006e-05, -9.6626e-05, -5.4733e-05])
-->name: fc5.weight ?-->grad_value: tensor([[0.0383, 0.0383, 0.0383]])
-->name: fc5.bias ?-->grad_value: tensor([-0.4794])
[Evaluate] best accuracy performence has been updated: 0.00000 --> 0.45000

从输出结果可以看到,将激活函数更换为Leaky ReLU后,死亡ReLU问题得到了改善,梯度恢复正常,参数也可以正常更新。但是由于 Leaky ReLU 中,x<0 时的斜率默认只有0.01,所以反向传播时,随着网络层数的加深,梯度值越来越小。如果想要改善这一现象,将 Leaky ReLU 中,x<0 时的斜率调大即可。




  • 神经网络的结构,即几层网络,输入输出怎么设计才最有效?


  • 数学理论证明,三层的神经网络就能够以任意精度逼近任何非线性连续函数。那么为什么还需要有深度网络?


  • 在不同应用场合下,激活函数怎么选择?


  • 学习率怎么怎么选择?


  • 训练次数设定多少训练出的模型效果更好?




  人工智能 最新文章
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
论文笔记:TEACHTEXT: CrossModal Generaliz
详解Python 3.x 导入(import)
上一篇文章      下一篇文章      查看所有文章
加:2022-10-08 20:42:06  更:2022-10-08 20:42:52 
