??机缘巧合下,近期又详细学习了一遍各损失函数的计算,特此记录以便后续回顾。
??为了公式表示更加清晰,我们设
y
n
∈
{
1
,
2
,
…
,
K
}
{{y_n} \in \{ 1,2, \ldots ,K\} }
yn?∈{1,2,…,K} 为样本
n
{n}
n 的真实标签,
v
=
(
v
1
,
v
2
,
…
v
K
)
{v = ({v_1},{v_2}, \ldots {v_K})}
v=(v1?,v2?,…vK?) 为网络的输出,即样本
n
{n}
n 的预测结果,设
N
{N}
N 为一批样本的数目(即Batch size),
K
{K}
K 为分类任务的类别数目。
??为了本文例子的统一展示,我们为网络的输出
p
r
e
d
s
{preds}
preds 和标签
t
a
r
g
e
t
{target}
target 赋值,即只有两个样本,一个标签为
2
{2}
2,另一个标签为
3
{3}
3。 ??
p
r
e
d
s
=
[
[
0.1
,
0.2
,
0.3
,
0.4
]
,
[
0.1
,
0.1
,
0.1
,
0.1
]
]
{preds = [[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]]}
preds=[[0.1,0.2,0.3,0.4],[0.1,0.1,0.1,0.1]] ??
t
a
r
g
e
t
=
[
2
,
3
]
{target = [2, 3]}
target=[2,3]
一、NLLLoss损失与CrossEntropy_Loss交叉熵损失
1. Softmax
??Softmax是网络输出后第一步操作,其公式可表示为:
e
v
y
n
∑
m
=
1
K
e
v
m
{\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }}}
∑m=1K?evm?evyn?????由于网络的输出有正有负,有大有小,Softmax主要是将输出概率标准化到
[
0
,
1
]
{[0,1]}
[0,1] 之间,方便比较,示例计算如下:
[
0.1
,
0.2
,
0.3
,
0.4
]
→
s
o
f
t
m
a
x
[
e
0.1
S
1
,
e
0.2
S
1
,
e
0.3
S
1
,
e
0.4
S
1
]
=
[
0.2138
,
0.2363
,
0.2612
,
0.2887
]
{[0.1,0.2,0.3,0.4]\mathop \to \limits^{{\mathop{\rm softmax}\nolimits} } \left[ {\frac{{{e^{0.1}}}}{{{S_1}}},\frac{{{e^{0.2}}}}{{{S_1}}},\frac{{{e^{0.3}}}}{{{S_1}}},\frac{{{e^{0.4}}}}{{{S_1}}}} \right] = [0.2138,0.2363,0.2612,0.2887]}
[0.1,0.2,0.3,0.4]→softmax[S1?e0.1?,S1?e0.2?,S1?e0.3?,S1?e0.4?]=[0.2138,0.2363,0.2612,0.2887]
[
0.1
,
0.1
,
0.1
,
0.1
]
→
s
o
f
t
m
a
x
[
e
0.1
S
2
,
e
0.1
S
2
,
e
0.1
S
2
,
e
0.1
S
2
]
=
[
0.2500
,
0.2500
,
0.2500
,
0.2500
]
{[0.1,0.1,0.1,0.1]\mathop \to \limits^{{\mathop{\rm softmax}\nolimits} } \left[ {\frac{{{e^{0.1}}}}{{{S_2}}},\frac{{{e^{0.1}}}}{{{S_2}}},\frac{{{e^{0.1}}}}{{{S_2}}},\frac{{{e^{0.1}}}}{{{S_2}}}} \right] = [0.2500,0.2500,0.2500,0.2500]}
[0.1,0.1,0.1,0.1]→softmax[S2?e0.1?,S2?e0.1?,S2?e0.1?,S2?e0.1?]=[0.2500,0.2500,0.2500,0.2500]??其中,
S
1
=
e
0.1
+
e
0.2
+
e
0.3
+
e
0.4
{{S_1} = {e^{0.1}} + {e^{0.2}} + {e^{0.3}} + {e^{0.4}}}
S1?=e0.1+e0.2+e0.3+e0.4,
S
2
=
e
0.1
+
e
0.1
+
e
0.1
+
e
0.1
{{S_2} = {e^{0.1}} + {e^{0.1}} + {e^{0.1}} + {e^{0.1}}}
S2?=e0.1+e0.1+e0.1+e0.1
代码实现为:
import torch
import torch.nn.functional as F
preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
exp = torch.exp(preds)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)
softmax = exp / sum_
print('手动计算softmax:\n', softmax)
softmax_ = F.softmax(preds, dim=1)
print('函数计算softmax:\n', softmax_)
输出是一样的:
手动计算softmax:
tensor([[0.2138, 0.2363, 0.2612, 0.2887],
[0.2500, 0.2500, 0.2500, 0.2500]])
函数计算softmax:
tensor([[0.2138, 0.2363, 0.2612, 0.2887],
[0.2500, 0.2500, 0.2500, 0.2500]])
2. Log_Softmax
??Log_Softmax,算如其名,就是在Softmax之后进行
L
o
g
{Log}
Log,其公式可表示为:
log
?
(
e
v
y
n
∑
m
=
1
K
e
v
m
)
{\log (\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})}
log(∑m=1K?evm?evyn???) ??值得注意的是,这里的
L
o
g
{Log}
Log是以
e
{e}
e 为底的,即数学中的
I
n
{In}
In,示例计算如下:
[
0.2138
,
0.2363
,
0.2612
,
0.2887
]
→
[
I
n
(
0.2138
)
,
I
n
(
0.2363
)
,
I
n
(
0.2612
)
,
I
n
(
0.2887
)
]
{[0.2138,0.2363,0.2612,0.2887] \to [In(0.2138),In(0.2363),In(0.2612),In(0.2887)]}
[0.2138,0.2363,0.2612,0.2887]→[In(0.2138),In(0.2363),In(0.2612),In(0.2887)]
[
0.2500
,
0.2500
,
0.2500
,
0.2500
]
→
[
I
n
(
0.2500
)
,
I
n
(
0.2500
)
,
I
n
(
0.2500
)
,
I
n
(
0.2500
)
]
{[0.2500,0.2500,0.2500,0.2500] \to [In(0.2500),In(0.2500),In(0.2500),In(0.2500)]}
[0.2500,0.2500,0.2500,0.2500]→[In(0.2500),In(0.2500),In(0.2500),In(0.2500)]??此处手动计算与代码计算会由于保留小数问题存在微小的差异,保留小数更多时,就一样了。
代码实现为:
import torch
import torch.nn.functional as F
preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
exp = torch.exp(preds)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)
softmax = exp / sum_
log_softmax = torch.log(softmax)
print('手动计算log_softmax:\n', log_softmax)
softmax_ = F.log_softmax(preds, dim=1)
print('函数计算log_softmax:\n', softmax_)
输出是一样的:
手动计算log_softmax:
tensor([[-1.5425, -1.4425, -1.3425, -1.2425],
[-1.3863, -1.3863, -1.3863, -1.3863]])
函数计算log_softmax:
tensor([[-1.5425, -1.4425, -1.3425, -1.2425],
[-1.3863, -1.3863, -1.3863, -1.3863]])
3. NLLLoss
??NLLLoss损失,即对Log_Softmax之后的结果,将样本标签对应位置的数值进行相加,再除以样本量,最后再去负号,因为
L
o
g
{Log}
Log之后是负数,损失需要转换为正值。在我们的示例中: ??对
[
?
1.5425
,
?
1.4425
,
?
1.3425
,
?
1.2425
]
{[ - 1.5425, - 1.4425, - 1.3425, - 1.2425]}
[?1.5425,?1.4425,?1.3425,?1.2425] 和
[
?
1.3863
,
?
1.3863
,
?
1.3863
,
?
1.3863
]
{[-1.3863, -1.3863, -1.3863, -1.3863]}
[?1.3863,?1.3863,?1.3863,?1.3863] 标签对应位置
t
a
r
g
e
t
=
[
2
,
3
]
{target = [2, 3]}
target=[2,3] 上的数值相加除
2
{2}
2 再取负,即:
?
(
?
1.3425
)
+
(
?
1.3863
)
2
=
1.3644
{ - \frac{{( - 1.3425) + ( - 1.3863)}}{2}{\rm{ = }}1.3644}
?2(?1.3425)+(?1.3863)?=1.3644代码实现为:
import torch
import torch.nn.functional as F
preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
target = torch.tensor([2, 3])
exp = torch.exp(preds)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)
softmax = exp / sum_
log_softmax = torch.log(softmax)
one_hot = F.one_hot(target).float()
nllloss = -torch.sum(one_hot * log_softmax) / target.shape[0]
print('手动计算nllloss:\n', nllloss)
Log_Softmax = F.log_softmax(preds, dim=1)
Nllloss = F.nll_loss(Log_Softmax, target)
print('函数计算nllloss:\n', Nllloss)
输出是一样的:
手动计算nllloss:
tensor(1.3644)
函数计算nllloss:
tensor(1.3644)
4. CrossEntropy_Loss
??有了对Softmax、Log_Softmax和NLLLoss损失的了解,交叉熵损失CrossEntropy_Loss就是他们的齐活: ??
C
r
o
s
s
E
n
t
r
o
p
y
_
L
o
s
s
=
S
o
f
t
m
a
x
+
L
o
g
+
N
L
L
L
o
s
s
{CrossEntropy\_Loss = Softmax + Log + NLLLoss}
CrossEntropy_Loss=Softmax+Log+NLLLoss =
L
o
g
_
S
o
f
t
m
a
x
+
N
L
L
L
o
s
s
{Log\_Softmax + NLLLoss}
Log_Softmax+NLLLoss ??公式可表示为:
?
1
N
∑
n
=
1
N
log
?
(
e
v
y
n
∑
m
=
1
K
e
v
m
)
{ - \frac{1}{N}\sum\limits_{n = 1}^N {\log (\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})}}
?N1?n=1∑N?log(∑m=1K?evm?evyn???) ??CrossEntropy_Loss与NLLLoss计算结果是一致的,因为NLLLoss的输入一般也是Log_Softmax之后的结果。
完整代码实现为:
import torch
import torch.nn.functional as F
preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
target = torch.tensor([2, 3])
one_hot = F.one_hot(target).float()
print('[1]one_hot编码target:\n', one_hot)
exp = torch.exp(preds)
print('[2]对网络预测preds求指数:\n', exp)
sum_ = torch.sum(exp, dim=1).reshape(-1, 1)
softmax = exp / sum_
print('[3]softmax操作:\n', softmax)
log_softmax = torch.log(softmax)
print('[4]softmax后取对数:\n', log_softmax)
nllloss = -torch.sum(one_hot * log_softmax) / target.shape[0]
print("[5]手动使用nllloss计算交叉熵:", nllloss)
print('----------------------------------------------')
Log_Softmax = F.log_softmax(preds, dim=1)
Nllloss = F.nll_loss(Log_Softmax, target)
print("函数使用Nllloss计算交叉熵:", Nllloss)
cross_entropy = F.cross_entropy(preds, target)
print('函数交叉熵cross_entropy:', cross_entropy)
输出为:
[1]one_hot编码target:
tensor([[0., 0., 1., 0.],
[0., 0., 0., 1.]])
[2]对网络预测preds求指数:
tensor([[1.1052, 1.2214, 1.3499, 1.4918],
[1.1052, 1.1052, 1.1052, 1.1052]])
[3]softmax操作:
tensor([[0.2138, 0.2363, 0.2612, 0.2887],
[0.2500, 0.2500, 0.2500, 0.2500]])
[4]softmax后取对数:
tensor([[-1.5425, -1.4425, -1.3425, -1.2425],
[-1.3863, -1.3863, -1.3863, -1.3863]])
[5]手动使用nllloss计算交叉熵: tensor(1.3644)
----------------------------------------------
函数使用Nllloss计算交叉熵: tensor(1.3644)
函数交叉熵cross_entropy: tensor(1.3644)
二、交叉熵损失的Label Smoothing
??Label Smoothing (论文传送) 是一种正则化手段,在一定程度上可以避免模型的过拟合。在交叉熵损失CrossEntropy_Loss中,非标签对应位置的预测信息是没有被使用的,而Label Smoothing使用了这种信息,宏观上讲,也是略微改变了标签的分布,使得标签不在是非0即1了,故而称为标签平滑。 ??Label Smoothing的公式可表示为:
(
1
?
ε
)
?
[
?
1
N
∑
n
=
1
N
log
?
(
e
v
y
n
∑
m
=
1
K
e
v
m
)
]
+
ε
?
[
?
1
N
K
∑
n
=
1
N
∑
k
=
1
K
log
?
(
e
v
k
∑
m
=
1
K
e
v
m
)
]
{(1 - \varepsilon ) \cdot [ - \frac{1}{N}\sum\limits_{n = 1}^N {\log (\frac{{{e^{{v_{{y_n}}}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})} ] + \varepsilon \cdot [ - \frac{1}{{NK}}\sum\limits_{n = 1}^N {\sum\limits_{k = 1}^K {\log (\frac{{{e^{{v_k}}}}}{{\sum\nolimits_{m = 1}^K {{e^{{v_m}}}} }})} } ]}
(1?ε)?[?N1?n=1∑N?log(∑m=1K?evm?evyn???)]+ε?[?NK1?n=1∑N?k=1∑K?log(∑m=1K?evm?evk??)]??从公式可以看出,系数为
(
1
?
ε
)
{(1 - \varepsilon )}
(1?ε) 的前一部分就是交叉熵损失,后一部分,涵盖了非标签对应位置上的预测信息,在我们的示例中,后一部分的计算为:
[
?
1.5425
,
?
1.4425
,
?
1.3425
,
?
1.2425
]
→
s
u
m
?
5.5700
{[ - 1.5425, - 1.4425, - 1.3425, - 1.2425]\mathop \to \limits^{{\rm{sum}}} - 5.5700}
[?1.5425,?1.4425,?1.3425,?1.2425]→sum?5.5700
[
?
1.3863
,
?
1.3863
,
?
1.3863
,
?
1.3863
]
→
s
u
m
?
5.5452
{[ - 1.3863, - 1.3863, - 1.3863, - 1.3863]\mathop \to \limits^{{\rm{sum}}} - 5.5452}
[?1.3863,?1.3863,?1.3863,?1.3863]→sum?5.5452??对Log_Softmax之后的结果求和,取负数,再除以样本量
2
{2}
2 和 分类类别数
4
{4}
4,得到:
?
(
?
5.5700
)
+
(
?
5.5452
)
2
×
4
=
1
.
3894
{ - \frac{{( - 5.5700) + ( - 5.5452)}}{{2 \times 4}} = {\rm{1}}{\rm{.3894}}}
?2×4(?5.5700)+(?5.5452)?=1.3894??最后以
ε
{\varepsilon}
ε 系数与交叉熵损失进行加权,设
ε
=
0.1
{\varepsilon}=0.1
ε=0.1 ,得到:
(
1
?
0.1
)
×
1.3644
+
0.1
×
1.3894
=
1.3669
{(1 - 0.1) \times 1.3644 + 0.1 \times 1.3894 = 1.3669}
(1?0.1)×1.3644+0.1×1.3894=1.3669代码实现为:
import torch
import torch.nn.functional as F
import torch.nn as nn
def linear_combination(x, y, epsilon):
return epsilon * x + (1 - epsilon) * y
def reduce_loss(loss, reduction='mean'):
return loss.mean() if reduction == 'mean' else loss.sum() if reduction == 'sum' else loss
class LabelSmoothing_CrossEntropy(nn.Module):
def __init__(self, epsilon: float = 0.1, reduction='mean'):
super().__init__()
self.epsilon = epsilon
self.reduction = reduction
def forward(self, preds, target):
n = preds.size()[-1]
log_preds = F.log_softmax(preds, dim=-1)
loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction)
nll = F.nll_loss(log_preds, target, reduction=self.reduction)
return linear_combination(loss / n, nll, self.epsilon)
preds = torch.tensor([[0.1, 0.2, 0.3, 0.4], [0.1, 0.1, 0.1, 0.1]])
target = torch.tensor([2, 3])
ls = LabelSmoothing_CrossEntropy()
lsloss = ls(preds, target)
print('Label smoothing损失:', lsloss)
输出为:
Label smoothing损失: tensor(1.3669)
|