d2l存在的意义其实也就是把之前声明过的函数都存起来了,通过第一次@save标记自动记录了下来,以后就能直接调用了。d2l完全可取代。
3.4 softmax回归
回归 vs 分类
回归:“多少” 分类:“哪一个” 一种情况,只关心硬类别(样本属于哪个类别),但是仍然使用软类别的模型(属于每个类别的概率)
3.4.1 分类问题
表示分类数据的方法:独热编码one-hot encoding 独热编码是一个向量,分量和类别一样多; 类别对应的分量设置为1,其它所有分量设置为0. 例如:y∈{(1,0,0),(0,1,0),(0,0,1)}
3.4.2 网络架构
为了估计所有可能类别的条件概率,需要一个有多个输出的模型,每个类别对应一个输出。 为了解决线性模型的分类问题,需要和输出?样多的仿射函数(affine function)。 每个输出对应于它自己的仿射函数。 向量形式表达为o = Wx + b
3.4.3 全连接层的参数开销
3.4.4 softmax运算
需要: ①希望模型的输出y?j可以视为属于类j的概率,然后选择具有最大输出值的类别argmaxj yj作为我们的预测 ②**不能将未规范化的预测o直接视作感兴趣的输出。**因为将线性层的输出直接 视为概率时存在?些问题:没有限制这些输出数字的总和为1;输入不同,可以为负值。这些违反了概率基本公理。 ③需要?个训练目标来鼓励模型精准地估计概率。在分类器输出0.5的所有样本中,希望这些样本有?半实际上属于预测的类。这个属性叫做校准(calibration) 符合条件的模型产生:
3.4.5 小批量样本的矢量化
3.4.6 损失函数:最大似然估计
对数似然 softmax及其导数 交叉熵损失 使? (3.4.8)来定义损失l,它是所有标签分布的预期损失值。此损失称为交叉熵损失(crossentropy loss),它是分类问题最常?的损失之?。
3.4.7 信息论基础
熵 惊异 压缩与预测的关系:当数据易于预测,也就易于压缩 交叉熵
3.4.8 模型预测和评估
在训练softmax回归模型后,给出任何样本特征,我们可以预测每个输出类别的概率。通常我们使?预测概率最?的类别作为输出类别。如果预测与实际类别(标签)?致,则预测是正确的。 在接下来的实验中,我们将使?精度(accuracy)来评估模型的性能。精度等于正确预测数与预测总数之间的?率。 小结 ? softmax运算获取?个向量并将其映射为概率。 ? softmax回归适?于分类问题,它使?了softmax运算中输出类别的概率分布。 ? 交叉熵是?个衡量两个概率分布之间差异的很好的度量,它测量给定模型编码数据所需的?特数。
3.6 softmax回归的从零开始实现
交叉熵损失函数
import torch
import commfuncs
import time
import torch
import torchvision
from torch.utils import data
from torchvision import transforms
import matplotlib.pyplot as plt
def get_dataloader_workers():
return 0
def load_data_fashion_mnist(batch_size, resize=None):
trans = [transforms.ToTensor()]
if resize:
trans.insert(0, transforms.Resize(resize))
trans = transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST(root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(root="../data", train=False, transform=trans, download=True)
return (data.DataLoader(mnist_train, batch_size, shuffle=True, num_workers=get_dataloader_workers()),
data.DataLoader(mnist_test, batch_size, shuffle=False, num_workers=get_dataloader_workers()))
batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)
num_inputs = 784
num_outputs = 10
W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)
b = torch.zeros(num_outputs, requires_grad=True)
def softmax(X):
X_exp = torch.exp(X)
partition = X_exp.sum(1, keepdim=True)
return X_exp / partition
def net(X):
return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)
y = torch.tensor([0, 2])
y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.3, 0.5]])
def cross_entropy(y_hat, y):
return - torch.log(y_hat[range(len(y_hat)), y])
def accuracy(y_hat, y):
if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
y_hat = y_hat.argmax(axis=1)
cmp = y_hat.type(y.dtype) == y
return float(cmp.type(y.dtype).sum())
def evaluate_accuracy(net, data_iter):
if isinstance(net, torch.nn.Module):
net.eval()
metric = Accumulator(2)
with torch.no_grad():
for X, y in data_iter:
metric.add(accuracy(net(X), y), y.numel())
return metric[0] / metric[1]
class Accumulator:
def __init__(self, n):
self.data = [0.0] * n
def add(self, *args):
self.data = [a + float(b) for a, b in zip(self.data, args)]
def reset(self):
self.data = [0.0] * len(self.data)
def __getitem__(self, idx):
return self.data[idx]
def train_epoch_ch3(net, train_iter, loss, updater):
if isinstance(net, torch.nn.Module):
net.train()
metric = Accumulator(3)
for X, y in train_iter:
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
updater.zero_grad()
l.mean().backward()
updater.step()
else:
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
return metric[0] / metric[2], metric[1] / metric[2]
def sgd(params, lr, batch_size):
with torch.no_grad():
for param in params:
param -= lr * param.grad / batch_size
param.grad.zero_()
def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
for epoch in range(num_epochs):
train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
train_loss, train_acc = train_metrics
test_acc = evaluate_accuracy(net, test_iter)
print(f'epoch {epoch + 1}, train_loss {float(train_loss): f}, train_acc {float(train_acc): f}, '
f'test_acc {float(test_acc): f}')
lr = 0.1
def updater(batch_size):
return sgd([W, b], lr, batch_size)
num_epochs = 20
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)
def get_fashion_mnist_labels(labels):
text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat', 'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']
return [text_labels[int(i)] for i in labels]
def show_images(imgs, num_rows, num_cols, titles=None):
_, axes = plt.subplots(num_rows, num_cols)
axes = axes.flatten()
for i, (ax, img) in enumerate(zip(axes, imgs)):
if torch.is_tensor(img):
ax.imshow(img.numpy())
else:
ax.imshow(img)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)
if titles:
ax.set_title(titles[i])
plt.show()
def predict_ch3(net, test_iter, n=6):
for X, y in test_iter:
break
trues = get_fashion_mnist_labels(y)
preds = get_fashion_mnist_labels(net(X).argmax(axis=1))
titles = [true + '\n' + pred for true, pred in zip(trues, preds)]
show_images(X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])
predict_ch3(net, test_iter)
小结 ? 借助softmax回归,我们可以训练多分类的模型。 ? 训练softmax回归循环模型与训练线性回归模型?常相似:先读取数据,再定义模型和损失函数,然后 使用优化算法训练模型。?多数常?的深度学习模型都有类似的训练过程。
3.7 softmax回归的简洁实现
import torch
from torch import nn
from torchvision import transforms
import time
import torchvision
from torch.utils import data
from torchvision import transforms
def get_dataloader_workers():
return 0
def load_data_fashion_mnist(batch_size, resize=None):
trans = [transforms.ToTensor()]
if resize:
trans.insert(0, transforms.Resize(resize))
trans = transforms.Compose(trans)
mnist_train = torchvision.datasets.FashionMNIST(root="../data", train=True, transform=trans, download=True)
mnist_test = torchvision.datasets.FashionMNIST(root="../data", train=False, transform=trans, download=True)
return (data.DataLoader(mnist_train, batch_size, shuffle=True, num_workers=get_dataloader_workers()),
data.DataLoader(mnist_test, batch_size, shuffle=False, num_workers=get_dataloader_workers()))
def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)
class Accumulator:
def __init__(self, n):
self.data = [0.0] * n
def add(self, *args):
self.data = [a + float(b) for a, b in zip(self.data, args)]
def reset(self):
self.data = [0.0] * len(self.data)
def __getitem__(self, idx):
return self.data[idx]
def accuracy(y_hat, y):
if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:
y_hat = y_hat.argmax(axis=1)
cmp = y_hat.type(y.dtype) == y
return float(cmp.type(y.dtype).sum())
def evaluate_accuracy(net, data_iter):
if isinstance(net, torch.nn.Module):
net.eval()
metric = Accumulator(2)
with torch.no_grad():
for X, y in data_iter:
metric.add(accuracy(net(X), y), y.numel())
return metric[0] / metric[1]
def train_epoch_ch3(net, train_iter, loss, updater):
if isinstance(net, torch.nn.Module):
net.train()
metric = Accumulator(3)
for X, y in train_iter:
y_hat = net(X)
l = loss(y_hat, y)
if isinstance(updater, torch.optim.Optimizer):
updater.zero_grad()
l.mean().backward()
updater.step()
else:
l.sum().backward()
updater(X.shape[0])
metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
return metric[0] / metric[2], metric[1] / metric[2]
def sgd(params, lr, batch_size):
with torch.no_grad():
for param in params:
param -= lr * param.grad / batch_size
param.grad.zero_()
def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):
for epoch in range(num_epochs):
train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
train_loss, train_acc = train_metrics
test_acc = evaluate_accuracy(net, test_iter)
print(f'epoch {epoch + 1}, train_loss {float(train_loss): f}, train_acc {float(train_acc): f}, '
f'test_acc {float(test_acc): f}')
batch_size = 256
train_iter, test_iter = load_data_fashion_mnist(batch_size)
net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))
net.apply(init_weights)
loss = nn.CrossEntropyLoss(reduction='none')
trainer = torch.optim.SGD(net.parameters(), lr=0.1)
num_epochs = 10
train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)
loss = nn.CrossEntropyLoss(reduction=‘none’) 对softmax的重新审视与实现(解决数值不稳定:上溢、下溢、指数计算) 解决方案:交叉熵和softmax相结合 上溢: 下溢、指数计算: 希望保留传统的softmax函数,以备我们需要评估通过模型输出的概率。但是,我们没有将softmax概率传递到损失函数中,而是在交叉熵损失函数中传递未规范化的预测,并同时计算softmax及其对数,这是?种类似“LogSumExp技巧”的聪明?式。 个人理解:要的效果也达到,但是复杂运算的时候中间步骤化简,转而计算不容易出问题的内容,具有等效性
|