[人工智能] 过拟合&欠拟合 || 深度学习 || Pytorch || 动手学深度学习11 || 跟李沐学AI

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> 过拟合&欠拟合 || 深度学习 || Pytorch || 动手学深度学习11 || 跟李沐学AI -> 正文阅读

[人工智能]过拟合&欠拟合 || 深度学习 || Pytorch || 动手学深度学习11 || 跟李沐学AI

昔我往矣，杨柳依依。今我来思，雨雪霏霏。 　　　　　　　　 ———《采薇》

本文是对于跟李沐学AI——动手学深度学习第11节：模型选择 + 过拟合和欠拟合的代码实现、主要是通过使用线性回归模型在自己生成的数据集上模拟模型对数据的过拟合和欠拟合、较为有效地说明了当非正常拟合发生时训练损失和测试损失的变化情况

import math
import numpy as np
import torch
from torch import nn
from d2l import torch as d2l

使用下面的三阶多项式来生成训练数据集和测试数据集
$\frac{6}{5}\frac{x}{1!} - \frac{17}{5}\frac{x^2}{2!} + \frac{28}{5}\frac{x^3}{3!} + \epsilon$
实际上、对数据的生成是按照20个特征进行的、只不过后面16项的系数都是0

max_degree = 20
n_train, n_test = 100, 100
true_w = np.zeros(max_degree)
true_w[0: 4] = np.array([5.0, 1.2, -3.4, 5.6])

features = np.random.normal(size=(n_train + n_test, 1))
np.random.shuffle(features)   # 这是在原来的数组上打乱、即features已经乱过了
poly_features = np.power(features, np.arange(max_degree).reshape(1, -1))
for i in range(max_degree):
    poly_features[:, i] /= math.gamma(i + 1)

labels = np.dot(poly_features, true_w)
labels += np.random.normal(scale=0.1, size=labels.shape)

整个生成的过程并不是特别的清楚、下面对其进行拆解
生成权重true_w是先按特征的维度、即20来生成全是0的数组、再对前四个元素进行替换

max_degree = 20                                  # 特征的维度
n_train, n_test = 100, 100                       # 训练数据集和测试数据集的数据量
true_w = np.zeros(max_degree)                    # 生成大小为20的、全是0的数组
true_w[0: 4] = np.array([5.0, 1.2, -3.4, 5.6])   # 对前四个元素进行替换

生成1列 n_train + n_test、即200行的、服从标准正态分布的数组 features、这是最开始的部分、之后要以此为基础生成其他项所对应的数据
将得到的 features 打乱顺序

features = np.random.normal(size=(n_train + n_test, 1))
np.random.shuffle(features)   # 这是在原来的数组上打乱、也就是说features已经乱过了

对于 np.random.shuffle( ) 的使用

X = np.arange(12)
print('Before shuffling: ', X)
np.random.shuffle(X)
print('After shuffling : ', X)

Before shuffling:  [ 0  1  2  3  4  5  6  7  8  9 10 11]
After shuffling :  [10  0  9  3  5  2  8  4  1 11  6  7]

使用np.power( ) 可以得到对应元素的次方、目的在于将features 作为输入的 $x$ 来得到其他特征、即 $x^2、x^3、x^4...$
这里要注意features和幂次数组的形状、前者为(200, 1)、后者为(1, 20)、如果不是这样、将无法得到期望的效果

poly_features = np.power(features, np.arange(max_degree).reshape(1, -1))

参考下面的例子

np.power(np.array([[1], [2], [3], [4]]), np.array([[1, 2, 3, 4]]))

array([[  1,   1,   1,   1],
       [  2,   4,   8,  16],
       [  3,   9,  27,  81],
       [  4,  16,  64, 256]], dtype=int32)

到这里、对于特征本身的数据已经生成完毕、下面需要 将这些特征代入到函数当中
这个小小的循环是用来除以阶乘的

for i in range(max_degree):
    poly_features[:, i] /= math.gamma(i + 1)

对于内置库 math 中的 gamma( ) 函数、当传入正整数时、得到的值就是前一个整数的阶乘、所以在循环当中需要加1来得到对应的阶乘

math.gamma(1), math.gamma(2), math.gamma(3), math.gamma(4), math.gamma(5)  # 0的阶乘、1的阶乘、2的阶乘、3的阶乘、4的阶乘

(1.0, 1.0, 2.0, 6.0, 24.0)

最后让特征矩阵和权重向量的对应元素相乘再相加、并给得到的结果一点随机扰动

labels = np.dot(poly_features, true_w)
labels += np.random.normal(scale=0.1, size=labels.shape)

运算上的内容总是记不太清楚、对应元素的相乘再相加 和对应元素相乘 不相加 的写法是不一样的

X, w = np.arange(24).reshape(6, 4), np.array([0.5, 0, 1, 10])
X, np.dot(X, w), X @ w, X * w

(array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]),
 array([ 32.,  78., 124., 170., 216., 262.]),
 array([ 32.,  78., 124., 170., 216., 262.]),
 array([[  0.,   0.,   2.,  30.],
        [  2.,   0.,   6.,  70.],
        [  4.,   0.,  10., 110.],
        [  6.,   0.,  14., 150.],
        [  8.,   0.,  18., 190.],
        [ 10.,   0.,  22., 230.]]))

将NumPy 的数组转换为Torch 的张量

true_w, features, poly_features, labels = [torch.tensor(x, dtype=torch.float32) for x in [true_w, features, poly_features, labels]]

查看一部分特征和标签、这里说成标签总感觉不太合适

features[: 4], poly_features[: 4, : ], labels[: 4]

(tensor([[ 0.4296],
         [ 1.0613],
         [-1.1355],
         [-0.4227]]),
 tensor([[ 1.0000e+00,  4.2957e-01,  9.2267e-02,  1.3212e-02,  1.4189e-03,
           1.2190e-04,  8.7275e-06,  5.3559e-07,  2.8759e-08,  1.3727e-09,
           5.8967e-11,  2.3028e-12,  8.2435e-14,  2.7240e-15,  8.3582e-17,
           2.3936e-18,  6.4265e-20,  1.6239e-21,  3.8755e-23,  8.7622e-25],
         [ 1.0000e+00,  1.0613e+00,  5.6315e-01,  1.9922e-01,  5.2856e-02,
           1.1219e-02,  1.9844e-03,  3.0085e-04,  3.9911e-05,  4.7063e-06,
           4.9946e-07,  4.8188e-08,  4.2617e-09,  3.4791e-10,  2.6373e-11,
           1.8659e-12,  1.2377e-13,  7.7265e-15,  4.5555e-16,  2.5445e-17],
         [ 1.0000e+00, -1.1355e+00,  6.4463e-01, -2.4398e-01,  6.9258e-02,
          -1.5728e-02,  2.9764e-03, -4.8279e-04,  6.8523e-05, -8.6450e-06,
           9.8160e-07, -1.0132e-07,  9.5874e-09, -8.3739e-10,  6.7915e-11,
          -5.1410e-12,  3.6483e-13, -2.4368e-14,  1.5371e-15, -9.1861e-17],
         [ 1.0000e+00, -4.2269e-01,  8.9333e-02, -1.2587e-02,  1.3301e-03,
          -1.1244e-04,  7.9212e-06, -4.7831e-07,  2.5272e-08, -1.1869e-09,
           5.0170e-11, -1.9278e-12,  6.7906e-14, -2.2079e-15,  6.6662e-17,
          -1.8785e-18,  4.9626e-20, -1.2339e-21,  2.8975e-23, -6.4461e-25]]),
 tensor([5.3036, 5.5432, 0.1252, 4.0034]))

定义评估模型在给定数据集上的评价损失的函数

def evaluate_loss(net, data_iter, loss):
    # 实例化累加器
    metric = d2l.Accumulator(2)
    # 从数据迭代器中拿到数据
    for X, y in data_iter:
        o = net(X)
        L = loss(o, y.reshape(o.shape))
        metric.add(L.sum(), L.numel())  # 函数.numel( )返回张量中元素的数目
    return metric[0] / metric[1]

定义一个看起来很复杂的训练函数
实际上是将之前使用过的内容集合到一个函数当中、包括模型的创建、得到训练数据集和测试数据集迭代器、实例化损失函数和更新器、并对模型按设置的迭代轮次进行训练
这个函数之后将多次使用来分别体现正常拟合、欠拟合和过拟合、所以并不会是特别地没有所谓的重复使用

def train(train_features, test_features, train_labels, test_labels, num_epoch=400):
    
    # 得到传入数据的特征数（***）
    input_shape = train_features.shape[-1]
    # 创建模型
    net = nn.Sequential(nn.Linear(input_shape, 1, bias=False))
    
    # 设置批量大小
    batch_size = min(10, train_labels.shape[0])
    # 实例化训练数据集迭代器和测试数据集迭代器
    train_iter = d2l.load_array((train_features, train_labels.reshape(-1, 1)), batch_size)
    test_iter = d2l.load_array((test_features, test_labels.reshape(-1, 1)), batch_size, is_train=False)
    
    # 实例化损失函数
    loss = nn.MSELoss()
    # 实例化更新器
    updater = torch.optim.SGD(net.parameters(), lr=0.01)
    
    # 实例化动画制作者
    animator = d2l.Animator(xlabel='Epoch', ylabel='Loss', yscale='log',
                           xlim=[1, num_epoch], ylim=[1e-3, 1e2], legend=['Train', 'Test'])
    
    # 对模型进行训练
    for epoch in range(num_epoch):
        # 调用之前经常使用的训练函数
        d2l.train_epoch_ch3(net, train_iter, loss, updater)
        # 只绘制部分迭代轮次对应点之间的折线
        if epoch == 0 or (epoch + 1) % 20 == 0:
            animator.add(epoch + 1, (evaluate_loss(net, train_iter, loss), evaluate_loss(net, test_iter, loss)))
    
    # 输出拟合出来的权重
    print('Weight: ', net[0].weight.data.numpy())

将前4个特征、即系数不为0的前4项传给模型、查看正常情况下对三阶多项式函数的拟合效果
随着迭代轮次地不断增加、训练损失和测试损失都在下降、但测试损失始终是高于训练损失的、最终得到的参数和实际生成时使用的是非常接近的

train(poly_features[: n_train, :4], poly_features[n_train: , : 4],
     labels[: n_train], labels[n_train: ])
# true_w: [5.0, 1.2, -3.4, 5.6]

Weight:  [[ 4.9893193  1.3259962 -3.335305   5.328977 ]]

模型对数据的正常拟合

如果只传给模型2个特征、那么当训练到一定程度时、所掌握的信息就只有那么多、之后再怎么学习都是没用的
或者说、模型已经达到最优、但是却仍然无法和数据很好地拟合、这属于是欠拟合的问题

train(poly_features[: n_train, : 2], poly_features[n_train: , : 2],
     labels[: n_train], labels[n_train: ])

Weight:  [[3.9040396 2.6143055]]

模型对数据欠拟合

这次把所有的特征都给模型、必要的前4个特征对应的权重模型肯定是会极力去贴近的、但是其他的16个特征所造成的影响是难以被忽略的、本来应该都是0的权重、多多少少都被给予了一个大于两位小数的值

train(poly_features[: n_train, : ], poly_features[n_train: , : ],
     labels[: n_train], labels[n_train: ], 1200)

Weight:  [[ 5.020522    1.3296317  -3.4379206   5.0934463   0.01132164  1.4248109
   0.1329909   0.04230715  0.15764914  0.11042187  0.15720691 -0.07347104
   0.20190383 -0.16957346 -0.18983693 -0.04662097  0.09221023 -0.1726413
  -0.04237635  0.01195528]]

模型对数据过拟合