PyTorch搭建模型流程总结
? 之前手动搭建的一个神经网络来实现MINIST识别问题,但是效果不好,正好课题需要使用PyTorch,也学习了一段时间,对搭建流程做一个总结,顺便看一下自己写的网络和torch相差多少~~
? 使用PyTorch主要需要自己实现的部分就是 定义自己的网络 以及 定义自己的训练流程,下面也会对着两点分别讲述,最终使用之前MINIST的数据集进行训练。之前手动实现的
继承Module实现自己的层
? nn.Module 是一个模型构造类,我们可以继承他来实现自己想要的模型。在自己定义的新类里面只需要对Module的 __init__() 以及 forward() 两个方法进行重载即可。
? 在__init__() 里面主要进行网络层的定义,将自己网络中所需要的层进行初始化。首先将定义自己所需的层,感觉一般需要的层在touch里面都已经定义好了,当然这里的层也可以是我们自己定义的,比如我们实现的一个网络,也可以作为一个层加入网络中作为网络的一部分。然后对模型参数进行初始化。以Bert的源码为例:
? Bert参考部分源码
def __init__(self, config, add_pooling_layer=True):
super().__init__(config)
self.config = config
self.embeddings = BertEmbeddings(config)
self.encoder = BertEncoder(config)
self.pooler = BertPooler(config) if add_pooling_layer else None
self.init_weights()
? 那么为什么将模型实例化后加入自定义的类就可以将他们联系在一起进行训练了呢?是因为模型的参数都是Parameter类,只要将模型加入Module中 或者 将Parameter加入到Module中,就会自动将其添加到模型的参数列表,就可以进行后面自动梯度的计算。
? 那么对于第二个方法forward() 也需要重载,在这个方法里面主要规定前面定义的层之间的计算顺序问题。因为我们在前面定义的层都是作为模型的一个参数,是没有顺序的(也可以使用Sequential 类进行更方便的实现模型,添加的顺序就是定义的顺序,但是这样不如自己实现一个类灵活)。对于计算,一般来说就可以按顺序,第一层的输出作为第二层的输入,以此计算,最后输出输出层的输出(套娃概念,不愧是我~~~)。那么还是以Bert为例,但是这里只截取重要的部分:
embedding_output = self.embeddings(
input_ids=input_ids,
position_ids=position_ids,
token_type_ids=token_type_ids,
inputs_embeds=inputs_embeds,
past_key_values_length=past_key_values_length,
)
encoder_outputs = self.encoder(
embedding_output,
attention_mask=extended_attention_mask,
head_mask=head_mask,
encoder_hidden_states=encoder_hidden_states,
encoder_attention_mask=encoder_extended_attention_mask,
past_key_values=past_key_values,
use_cache=use_cache,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
sequence_output = encoder_outputs[0]
pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
if not return_dict:
return (sequence_output, pooled_output) + encoder_outputs[1:]
return BaseModelOutputWithPoolingAndCrossAttentions(
last_hidden_state=sequence_output,
pooler_output=pooled_output,
past_key_values=encoder_outputs.past_key_values,
hidden_states=encoder_outputs.hidden_states,
attentions=encoder_outputs.attentions,
cross_attentions=encoder_outputs.cross_attentions,
)
? 那么我们可以定义神经网络如下:
import torch
from torch import nn
class ANN(nn.Module):
def __init__(self, input_size, hidden_size, output_size ):
super().__init__()
self.hidden_linear = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.output_linear = nn.Linear(hidden_size, output_size)
for params in self.parameters():
nn.init.normal_(params,mean=0,std=0.01)
def forward(self, x):
hid_out = self.hidden_linear(x)
relu_out = self.relu(hid_out)
output = self.output_linear(relu_out)
return output
定义自己的训练方法
? 一轮训练的主要流程可以总结如下:
- 根据网络模型计算输入
x 的输出 y_hat - 根据预测的输出
y_hat 计算与真实标签y 的损失 - 根据损失进行反向传播(计算梯度)
- 根据优化算法对权重进行调整
? 根据上面流程可以看出,我们需要自己准备数据、选择损失函数 以及 选择优化方法。将具体细节展示在下面程序里:
def train(net, train_data, train_labels, batch_size,epoch_num, learn_rate):
train_features = torch.tensor(train_data,dtype=torch.float)
train_labels = torch.tensor(train_labels,dtype=torch.float)
train_data = torch.utils.data.TensorDataset(train_features, train_labels)
train_iter = torch.utils.data.DataLoader(train_data, batch_size,shuffle=True)
loss = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=learn_rate)
for epoch in range(epoch_num):
l_sum,n,acc_sum,n_= 0.0,0.0,0.0,0.0
for X, y in train_iter:
y_hat = net(X.float())
l = loss(y_hat.float(),y.float()).sum()
optimizer.zero_grad()
l.backward()
optimizer.step()
l_sum += l.item()
n += y.shape[0]
acc_sum += (y_hat.argmax(dim=1) == y.argmax(dim=1)).float().sum().item()
n_ += y_hat.shape[0]
print("Epoch %d loss %.4f acc %.4f"%(epoch+1,l_sum/n, acc_sum/n_))
测试结果
train_data,train_labels,test_data,test_labels = loadData()
net = ANN(784,100,10)
batch_size = 10
epoch_num = 40
learn_rate = 0.3
train(net, train_data, train_labels, batch_size,epoch_num, learn_rate)
? 训练结果如下:
Epoch 1 loss 0.0036 acc 0.8083
Epoch 2 loss 0.0019 acc 0.9280
Epoch 3 loss 0.0015 acc 0.9520
Epoch 4 loss 0.0013 acc 0.9640
Epoch 5 loss 0.0012 acc 0.9737
Epoch 6 loss 0.0010 acc 0.9814
Epoch 7 loss 0.0009 acc 0.9863
Epoch 8 loss 0.0008 acc 0.9900
Epoch 9 loss 0.0008 acc 0.9900
Epoch 10 loss 0.0007 acc 0.9914
Epoch 11 loss 0.0007 acc 0.9914
Epoch 12 loss 0.0006 acc 0.9940
Epoch 13 loss 0.0006 acc 0.9946
Epoch 14 loss 0.0005 acc 0.9954
Epoch 15 loss 0.0005 acc 0.9960
Epoch 16 loss 0.0005 acc 0.9966
Epoch 17 loss 0.0004 acc 0.9966
Epoch 18 loss 0.0004 acc 0.9974
Epoch 19 loss 0.0004 acc 0.9969
Epoch 20 loss 0.0003 acc 0.9974
Epoch 21 loss 0.0003 acc 0.9980
Epoch 22 loss 0.0003 acc 0.9983
Epoch 23 loss 0.0003 acc 0.9986
Epoch 24 loss 0.0003 acc 0.9986
Epoch 25 loss 0.0003 acc 0.9983
Epoch 26 loss 0.0002 acc 0.9986
Epoch 27 loss 0.0002 acc 0.9986
Epoch 28 loss 0.0002 acc 0.9983
Epoch 29 loss 0.0002 acc 0.9986
Epoch 30 loss 0.0002 acc 0.9989
Epoch 31 loss 0.0002 acc 0.9989
Epoch 32 loss 0.0002 acc 0.9989
Epoch 33 loss 0.0002 acc 0.9989
Epoch 34 loss 0.0002 acc 0.9989
Epoch 35 loss 0.0002 acc 0.9989
Epoch 36 loss 0.0002 acc 0.9989
Epoch 37 loss 0.0002 acc 0.9989
Epoch 38 loss 0.0002 acc 0.9989
Epoch 39 loss 0.0001 acc 0.9989
Epoch 40 loss 0.0001 acc 0.9989
? MD,心痛啊~~ 仅需要不到一分钟就可以训练到99.89的精度(虽然是在训练集上)~看来之前自己写的网络确实有问题。
代码总结
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def standardization(x):
return (x - np.mean(x)) / np.std(x)
def loadImageData(trainingDirName = 'data/', test_ratio=0.3):
from os import listdir
data = np.empty(shape=(0,784))
labels = np.empty(shape=(0,1))
for num in range(10):
dirName = trainingDirName + '%s/'%(num)
nowNumList = [i for i in listdir(dirName) if i[-3:] == 'bmp']
labels = np.append(labels, np.full(shape=(len(nowNumList),1),fill_value=num),axis=0)
for aNumDir in nowNumList:
imageDir = dirName + aNumDir
image = plt.imread(imageDir).reshape((1,784))
data = np.append(data,image,axis=0)
m = data.shape[0]
shuffled_indices = np.random.permutation(m)
test_set_size = int(m * test_ratio)
test_indices = shuffled_indices[:test_set_size]
train_indices = shuffled_indices[test_set_size:]
trainData = data[train_indices]
trainLabels = labels[train_indices]
testData = data[test_indices]
testLabels = labels[test_indices]
tmean = np.mean(trainData)
tstd = np.std(trainData)
trainData = (trainData - tmean) / tstd
testData = (testData - tmean) / tstd
return trainData, trainLabels, testData, testLabels
def OneHotEncoder(labels,Label_class):
one_hot_label = np.array([[int(i == int(labels[j])) for i in range(Label_class)] for j in range(len(labels))])
return one_hot_label
def loadData(trainingDirName = 'data/', test_ratio=0.3):
trainData, trainLabels, testData, testLabels = loadImageData(trainingDirName, test_ratio)
train_data = np.matrix(trainData)
train_labels = OneHotEncoder(trainLabels,10)
test_data = np.matrix(testData)
test_labels = OneHotEncoder(testLabels,10)
return train_data,train_labels,test_data,test_labels
import torch
from torch import nn
class ANN(nn.Module):
def __init__(self, input_size, hidden_size, output_size ):
super().__init__()
self.hidden_linear = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.output_linear = nn.Linear(hidden_size, output_size)
for params in self.parameters():
nn.init.normal_(params,mean=0,std=0.01)
def forward(self, x):
hid_out = self.hidden_linear(x)
relu_out = self.relu(hid_out)
output = self.output_linear(relu_out)
return output
def train(net, train_data, train_labels, batch_size,epoch_num, learn_rate):
train_features = torch.tensor(train_data,dtype=torch.float)
train_labels = torch.tensor(train_labels,dtype=torch.float)
train_data = torch.utils.data.TensorDataset(train_features, train_labels)
train_iter = torch.utils.data.DataLoader(train_data, batch_size,shuffle=True)
loss = nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=learn_rate)
for epoch in range(epoch_num):
l_sum,n,acc_sum,n_= 0.0,0.0,0.0,0.0
for X, y in train_iter:
y_hat = net(X.float())
l = loss(y_hat.float(),y.float()).sum()
optimizer.zero_grad()
l.backward()
optimizer.step()
l_sum += l.item()
n += y.shape[0]
acc_sum += (y_hat.argmax(dim=1) == y.argmax(dim=1)).float().sum().item()
n_ += y_hat.shape[0]
print("Epoch %d loss %.4f acc %.4f"%(epoch+1,l_sum/n, acc_sum/n_))
train_data,train_labels,test_data,test_labels = loadData()
net = ANN(784,100,10)
batch_size = 10
epoch_num = 40
learn_rate = 0.3
train(net, train_data, train_labels, batch_size,epoch_num, learn_rate)
|