开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> Pytorch学习(二）—— nn模块 -> 正文阅读

[人工智能]Pytorch学习(二）—— nn模块

作者:token keyword

Pytorch nn模块提供了创建和训练神经网络的各种工具，其专门为深度学习设计，核心的数据结构是Module。Module是一个抽象的概念，既可以表示神经网络中的某个层，也可以表示一个包含很多层的神经网络。

nn.Module

nn.Module基类构造函数：

    def __init__(self):
    	self.training = True
        self._parameters = OrderedDict()
        self._buffers = OrderedDict()
        self._backward_hooks = OrderedDict()
        self._forward_hooks = OrderedDict()
        self._forward_pre_hooks = OrderedDict()
        self._state_dict_hooks = OrderedDict()
        self._load_state_dict_pre_hooks = OrderedDict()
        self._modules = OrderedDict()

其中对部分属性的解释如下：

training： 对于一些在训练和测试阶段采用策略不同的层如Dropout和BathNorm，通过training值决定前向传播策略。
_parameters： 用来保存用户直接设置的parameter。
_buffers： 缓存。
*_hooks： 存储管理hooks函数，用来提取中间变量。
_modules： 子module。

实际使用中，最常见的做法是继承nn.Module来撰写自定义的网络层，需要注意以下几点：

自定义层必须继承nn.Module，并且在其构造函数中需要调用nn.Module的构造函数。
必须在构造函数__init__中定义可学习参数。
使用forward函数实现前向传播过程。
无须写反向传播函数，nn.Module能够利用autograd自动实现反向传播。
Moudle中的可学习参数可以通过named_parameters()或者parameters()返回迭代器。

借助nn.Moudle实现简单的全连接层和多层感知机网络：

# -*- coding: utf-8 -*-
# create on 2021-06-29
# author: yang

import torch
from torch import nn

# 全连接层
class Linear(nn.Module):
    def __init__(self, in_features, out_features):
        super(Linear, self).__init__() # or nn.Module.__init__(self)
        self.w = nn.Parameter(torch.randn(in_features, out_features))
        self.b = nn.Parameter(torch.randn(out_features))

    def forward(self, x):
        x = x.mm(self.w)
        return x + self.b.expand_as(x)

# 多层感知机
class Perceptron(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        nn.Module.__init__(self)
        self.layer1 = Linear(in_features, hidden_features)
        self.layer2 = Linear(hidden_features, out_features)

    def forward(self, x):
        x = self.layer1(x)
        x = torch.sigmoid(x)
        return self.layer2(x)

if __name__ == '__main__':
    layer = Linear(4, 3)
    for name, parameter in layer.named_parameters():
        print(name, parameter)

    preceptron = Perceptron(3, 4, 1)
    for name, parameter in preceptron.named_parameters():
        print(name, parameter)

常用的神经网络相关层

nn模块中已经封装好了许多神经网络相关层，包括卷积、池化、激活等，实际使用时可借助ModuleList和Sequential简化网络构建过程：

# Sequential
# eg1:
net1 = nn.Sequential()
net1.add_module('conv', nn.Conv2d(3, 3, 3))
net1.add_module('batchnorm', nn.BatchNorm2d(3))
net1.add_module('relu', nn.ReLU())

# eg2:
net2 = nn.Sequential(nn.Conv2d(3, 3, 3),
                     nn.BatchNorm2d(3),
                     nn.ReLU()
                     )
# eg3:
from collections import OrderedDict
net3 = nn.Sequential(OrderedDict
                     ([('conv1', nn.Conv2d(3, 3, 3)),
                       ('batchnorm', nn.BatchNorm2d(3)),
                       ('relu', nn.ReLU())
                       ]))

# ModelList
modelist = nn.ModuleList([nn.Conv2d(3, 4), nn.BatchNorm2d(3), nn.ReLU()])

损失函数

深度学习中要用到各种各样的损失函数，这些损失函数可以看作是一些特殊的layer，Pytorch将这些损失函数实现为nn.Module的子类。

以交叉熵损失CrossEntropyLoss为例：

# Loss Function
score = torch.randn(3, 2)
label = torch.Tensor([1, 0, 1]).long()
criterion = nn.CrossEntropyLoss()
loss = criterion(score, label)

优化器

torch.optim中封装了许多深度学习中常用的优化方法，所有的优化方法都继承自基类optim.Optimizer，并实现了自己的优化步骤，以最基本的优化方法——随机梯度下降法(SGD)举例说明：

首先需要定义模型结构
选择合适的优化方法
设置学习率

from torch import optim

# Optimizer
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 6, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(6, 16, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )

        self.classifier = nn.Sequential(
            nn.Linear(16 * 5 * 5, 120),
            nn.ReLU(),
            nn.Linear(120, 84),
            nn.ReLU(),
            nn.Linear(84, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 16 * 5 * 5)
        x = self.classifier(x)
        return x

net = Net()
# 默认采用学习率为0.01
optimizer = optim.SGD(params=net.parameters(), lr=0.01)
optimizer.zero_grad()

input = torch.randn(1, 3, 32, 32)
output = net(input)
output.backward(output)

# 执行优化
optimizer.step()

# 为不同子网络设置不同的学习率
optimizer = optim.SGD([{'params': net.features.parameters()},
                       {'params': net.classifier.parameters(), 'lr': 1e-2}], lr=1e-5)

模型初始化策略

深度学习中参数的初始化非常重要，良好的初始化能够让模型更快收敛，达到更好的性能。Pytorch中nn.Module模块参数都采取了比较合理的初始化策略，我们也可以用自定义的初始化代替系统默认的初始化。nn.init模块专门为初始化设计，并实现了常用的初始化策略。

借助init实现xavier高斯初始化：

from torch.nn import init

linear = nn.Linear(3, 4)
torch.manual_seed(1)
init.xavier_normal_(linear.weight)
print(linear.weight.data)

nn和autograd

nn.functional

在介绍nn和autograd之间的关系前，先来介绍nn中另一个很常用的模块:nn.functional。nn中实现的大多数layer在functional中都有一个与之相对应的函数。nn.functional与nn.Module的主要区别在于：nn.Module实现的layer是一个特殊的类，由class Layer(nn.Module)定义，会自动提取可学习的参数；而nn.functional中的函数更像是纯函数，由def function(input) 定义。

当某一层有可学习参数时，如Conv，BathNorm等，最好使用nn.Module；由于激活、池化等层没有可学习的参数，因此可以使用对应的functional函数替代，二者在性能上没有太大的差异。在模型构建中，可以搭配使用nn.Module和nn.functional：

from torch.nn import functional as F
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 100)

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), 2)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

有可学习参数的层，也可以使用functional代替，只不过实现起来较为烦琐，需要手动定义参数：

class MyLinear(nn.Module):
    def __init__(self):
        super(MyLinear, self).__init__()
        self.weight = nn.Parameter(torch.zeros(4, 3))
        self.bias = nn.Parameter(torch.zeros(4))

    def forward(self, x):
        x = F.linear(x, self.weight, self.bias)
        return x

nn和autograd的关系

nn是建立在autograd之上的模块，主要的工作是实现前向传播。nn.Module对输入的Tensor进行的各种操作，本质上都用到了autograd技术。

autograd.Function和nn.Module之间的区别如下：

autograd.Function利用Tensor对autograd技术的扩展，为autograd实现新的运算op
nn.functional是autograd操作的集合，是经过封装的函数。
nn.Module利用autograd，对nn的功能进行扩展，构建网络时,使用nn.Module作为基本元,nn.Module通常包裹autograd.Function作为真正实现的部分。例如:
nn.ReLU = nn.functional.relu()
nn.functional.relu()类型为Function,再往下真正完成计算的部分通常使用C++实现。
如果某个操作在autograd中尚未支持，则需要利用Function手动实现对应的前向传播反向传播

hooks简介

hooks了解的不多，简单认为是一种获取模型中间结果(包括前向传播的输出和反向传播的梯度)的方法，前向传播的hooks函数有如下形式：hook(module, input, output) -> None，反向传播则具有如下形式：hook(module, input, output) -> Tensor or None，hooks函数不应修改输入和输出，并且在使用后应当及时删除，避免增加运行负载。

from torchvision import models

model = models.resnet34()
features = torch.Tensor()

def hooks(module, input, output):
    features.copy_(output.data)

handle = model.layer8.register_forward_hook(hooks)
output = model(input)
handle.remove()

模型保存

在Pytorch中，所有Module对象都具有state_dict()函数，返回当前Module的所有状态数据。将这些状态数据保存后，下次是用模型时即可利用load_state_dict()函数将状态加载进来。

# save model
torch.save(net.state_dict(), 'net.pth')

# load model
net2 = Net()
net2.load_state_dict(torch.load('net.pth'))

还有另外一种保存模型的方法：

torch.save(net, 'net_all.pth')
net2 = torch.load('net_all.pth')

目前，pytorch提供了onnx借口，可将pth模型导出为onnx模型。

GPU计算

将Module放在GPU上运行：

将模型所有参数转存到GPU： model = model.cuda
将输入数据放到GPU：input = input.cuda()

Pytorch提供了两种方式，可在多个GPU上并行计算，二者参数十分相似，通过device_ids指定在哪些GPU上进行优化，output_device指定输出到那个GPU。不同之处在于nn.parallel.data_parallel直接利用多GPU并行计算得出结果，nn.DataParallel返回一个新的module，能够自动在多GPU上进行并行加速。

# GPU 并行计算
'''
	DataParallel并行的方式是将输入一个batch的数据均分成多份，分别送到对应的GPU
	进行计算，然后将各个GPU得到的梯度相加。与Module相关的所有数据也会以浅拷贝的方
	式复制多份
'''
# method 1
new_net = nn.DataParallel(net, device_ids=[0, 1])

# method 2
output = nn.parallel.data_parallel(net, input, device_ids=[0, 1])