主要参考文章1 主要参考文章2
output = function(input) input和output都有标量、向量(本文中的向量均为列向量,其转置为行向量)、矩阵三种形式,input用x,x,X表示,output用f,f,F表示,共9种情况。即: f(x),f(x),f(X),f(x),f(x),f(X),F(x),F(x),F(X) 每种情况又有 分子布局(numerator layout)和分母布局(denominator layout)两种表示方式。 分子布局:分子为列向量,分母为行向量 分母布局:分母为列向量,分子为行向量
x
=
[
x
1
x
2
.
.
.
x
n
]
X
=
[
x
11
x
12
.
.
.
x
1
n
x
21
x
22
.
.
.
x
2
n
.
.
.
.
.
.
.
.
.
.
.
.
x
m
1
x
m
2
.
.
.
x
m
n
]
f
=
[
f
1
f
2
.
.
.
f
n
]
F
=
[
f
11
f
12
.
.
.
f
1
n
f
21
f
22
.
.
.
f
2
n
.
.
.
.
.
.
.
.
.
.
.
.
f
m
1
f
m
2
.
.
.
f
m
n
]
v
e
c
(
X
)
=
[
x
11
,
x
21
,
.
.
.
,
x
m
1
,
x
12
,
x
22
,
.
.
.
,
x
m
2
,
.
.
.
,
x
1
n
,
x
2
n
,
.
.
.
,
x
m
n
]
T
\pmb{x}=\begin{bmatrix} x_1\\x_2\\...\\x_n \end{bmatrix}\\ \pmb{X}=\begin{bmatrix} x_{11}&x_{12}&...&x_{1n}\\x_{21}&x_{22}&...&x_{2n}\\...&...&...&...\\x_{m1}&x_{m2}&...&x_{mn} \end{bmatrix}\\ \pmb{f}=\begin{bmatrix} f_1\\f_2\\...\\f_n \end{bmatrix}\\ \pmb{F}=\begin{bmatrix} f_{11}&f_{12}&...&f_{1n}\\f_{21}&f_{22}&...&f_{2n}\\...&...&...&...\\f_{m1}&f_{m2}&...&f_{mn} \end{bmatrix}\\ vec(\pmb X)=[x_{11},x_{21},...,x_{m1},x_{12},x_{22},...,x_{m2},...,x_{1n},x_{2n},...,x_{mn}]^T
xxx=?????x1?x2?...xn???????XXX=?????x11?x21?...xm1??x12?x22?...xm2??............?x1n?x2n?...xmn???????f?f??f=?????f1?f2?...fn???????FFF=?????f11?f21?...fm1??f12?f22?...fm2??............?f1n?f2n?...fmn???????vec(XXX)=[x11?,x21?,...,xm1?,x12?,x22?,...,xm2?,...,x1n?,x2n?,...,xmn?]T 注: 1)在深度学习中,较多使用的是分母排列方式。 2)两种排列方式只是两派人的符号约定,不同领域的不同作者会使用不同的符号约定(分子排列和分母排列中的一个)
-
f(x)
?
f
?
x
(1)
\frac{\partial f}{\partial x}\tag{1}
?x?f?(1) -
f(x)
分
母
布
局
(
梯
度
向
量
形
式
/
列
向
量
偏
导
形
式
/
列
偏
导
向
量
形
式
)
:
?
x
f
(
x
)
=
?
f
(
x
)
?
x
=
[
?
f
?
x
1
?
f
?
x
2
.
.
.
?
f
?
x
n
]
(2)
分母布局(\pmb{梯度向量形式}/列向量偏导形式/列偏导向量形式):\nabla_{\pmb x}f(\pmb x)=\frac{\partial f(\pmb x)}{\partial \pmb x}=\begin{bmatrix} \frac{\partial f}{\partial x_1}\\\frac{\partial f}{\partial x_2}\\...\\\frac{\partial f}{\partial x_n} \end{bmatrix}\tag{2}
分母布局(梯度向量形式梯度向量形式梯度向量形式/列向量偏导形式/列偏导向量形式):?xxx?f(xxx)=?xxx?f(xxx)?=???????x1??f??x2??f?...?xn??f????????(2)
分
子
布
局
(
行
向
量
偏
导
形
式
/
行
偏
导
向
量
形
式
)
:
D
x
f
(
x
)
=
?
f
(
x
)
?
x
T
=
[
?
f
?
x
1
?
f
?
x
2
.
.
.
?
f
?
x
n
]
(3)
分子布局(行向量偏导形式/行偏导向量形式):D_{\pmb x}f(\pmb x)=\frac{\partial f(\pmb x)}{\partial \pmb x^T}=\begin{bmatrix} \frac{\partial f}{\partial x_1}&\frac{\partial f}{\partial x_2}&...&\frac{\partial f}{\partial x_n} \end{bmatrix}\tag{3}
分子布局(行向量偏导形式/行偏导向量形式):Dxxx?f(xxx)=?xxxT?f(xxx)?=[?x1??f???x2??f??...??xn??f??](3) -
f(X) (4)和(6)互为转置,(5)和(7)互为转置 当X为列向量时,(2)(4)(5)相等,(3)(6)(7)相等。
梯
度
向
量
形
式
/
列
向
量
偏
导
形
式
/
列
偏
导
向
量
形
式
:
?
v
e
c
?
X
f
(
X
)
=
?
f
(
X
)
?
v
e
c
?
X
=
[
?
f
?
x
11
?
f
?
x
21
.
.
.
?
f
?
x
m
1
?
f
?
x
12
?
f
?
x
22
.
.
.
?
f
?
x
m
2
.
.
.
?
f
?
x
1
n
?
f
?
x
2
n
.
.
.
?
f
?
x
m
n
]
(4)
\pmb{梯度向量形式}/列向量偏导形式/列偏导向量形式:\nabla_{vec\:\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial vec \:\pmb X}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}\\\frac{\partial f}{\partial x_{21}}\\...\\\frac{\partial f}{\partial x_{m1}}\\ \frac{\partial f}{\partial x_{12}}\\\frac{\partial f}{\partial x_{22}}\\...\\\frac{\partial f}{\partial x_{m2}}\\...\\\frac{\partial f}{\partial x_{1n}}\\ \frac{\partial f}{\partial x_{2n}}\\...\\\frac{\partial f}{\partial x_{mn}} \end{bmatrix}\tag{4}
梯度向量形式梯度向量形式梯度向量形式/列向量偏导形式/列偏导向量形式:?vecXXX?f(XXX)=?vecXXX?f(XXX)?=??????????????????????????x11??f??x21??f?...?xm1??f??x12??f??x22??f?...?xm2??f?...?x1n??f??x2n??f?...?xmn??f???????????????????????????(4)
梯
度
矩
阵
:
?
X
f
(
X
)
=
?
f
(
X
)
?
X
=
[
?
f
?
x
11
?
f
?
x
12
.
.
.
?
f
?
x
1
n
?
f
?
x
21
?
f
?
x
22
.
.
.
?
f
?
x
2
n
.
.
.
.
.
.
.
.
.
.
.
.
?
f
?
x
m
1
?
f
?
x
m
2
.
.
.
?
f
?
x
m
n
]
(5)
\pmb{梯度矩阵}:\nabla_{\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial \pmb X}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{12}}&...&\frac{\partial f}{\partial x_{1n}}\\ \frac{\partial f}{\partial x_{21}}&\frac{\partial f}{\partial x_{22}}&...&\frac{\partial f}{\partial x_{2n}}\\ ...&...&...&...\\ \frac{\partial f}{\partial x_{m1}}&\frac{\partial f}{\partial x_{m2}}&...&\frac{\partial f}{\partial x_{mn}}\\ \end{bmatrix}\tag{5}
梯度矩阵梯度矩阵梯度矩阵:?XXX?f(XXX)=?XXX?f(XXX)?=???????x11??f??x21??f?...?xm1??f???x12??f??x22??f?...?xm2??f??............??x1n??f??x2n??f?...?xmn??f????????(5)
行
向
量
偏
导
形
式
/
行
偏
导
向
量
形
式
:
D
v
e
c
?
X
f
(
X
)
=
?
f
(
X
)
?
v
e
c
T
?
X
=
[
?
f
?
x
11
?
f
?
x
21
.
.
.
?
f
?
x
m
1
?
f
?
x
12
?
f
?
x
22
.
.
.
?
f
?
x
m
2
.
.
.
?
f
?
x
1
n
?
f
?
x
2
n
.
.
.
?
f
?
x
m
n
]
(6)
行向量偏导形式/行偏导向量形式:D_{vec\:\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial vec^T \:\pmb X}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{21}}& ...& \frac{\partial f}{\partial x_{m1}}& \frac{\partial f}{\partial x_{12}}& \frac{\partial f}{\partial x_{22}}& ...& \frac{\partial f}{\partial x_{m2}}& ...& \frac{\partial f}{\partial x_{1n}}& \frac{\partial f}{\partial x_{2n}}& ...& \frac{\partial f}{\partial x_{mn}} \end{bmatrix}\tag{6}
行向量偏导形式/行偏导向量形式:DvecXXX?f(XXX)=?vecTXXX?f(XXX)?=[?x11??f???x21??f??...??xm1??f???x12??f???x22??f??...??xm2??f??...??x1n??f???x2n??f??...??xmn??f??](6)
J
a
c
o
b
i
a
n
矩
阵
:
D
X
f
(
X
)
=
?
f
(
X
)
?
X
T
=
[
?
f
?
x
11
?
f
?
x
21
.
.
.
?
f
?
x
m
1
?
f
?
x
12
?
f
?
x
22
.
.
.
?
f
?
x
m
2
.
.
.
.
.
.
.
.
.
.
.
.
?
f
?
x
1
n
?
f
?
x
2
n
.
.
.
?
f
?
x
m
n
]
(7)
Jacobian矩阵:D_{\pmb X}f(\pmb X)=\frac{\partial f(\pmb X)}{\partial \pmb X^T}=\begin{bmatrix} \frac{\partial f}{\partial x_{11}}&\frac{\partial f}{\partial x_{21}}&...&\frac{\partial f}{\partial x_{m1}}\\ \frac{\partial f}{\partial x_{12}}&\frac{\partial f}{\partial x_{22}}&...&\frac{\partial f}{\partial x_{m2}}\\ ...&...&...&...\\ \frac{\partial f}{\partial x_{1n}}&\frac{\partial f}{\partial x_{2n}}&...&\frac{\partial f}{\partial x_{mn}}\\ \end{bmatrix}\tag{7}
Jacobian矩阵:DXXX?f(XXX)=?XXXT?f(XXX)?=???????x11??f??x12??f?...?x1n??f???x21??f??x22??f?...?x2n??f??............??xm1??f??xm2??f?...?xmn??f????????(7)
以下为自己的总结,可能包含错误 -
f(x)
?
x
f
(
x
)
=
?
f
(
x
)
?
x
=
[
?
f
1
?
x
?
f
2
?
x
.
.
.
?
f
n
?
x
]
D
x
f
(
x
)
=
?
f
T
(
x
)
?
x
=
[
?
f
1
?
x
?
f
2
?
x
.
.
.
?
f
n
?
x
]
\nabla_x\pmb f(x)= \frac{\partial \pmb f(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_1}{\partial x}\\ \frac{\partial f_2}{\partial x}\\ ...\\ \frac{\partial f_n}{\partial x} \end{bmatrix}\\ D_x\pmb f(x)= \frac{\partial \pmb f^T(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_1}{\partial x}& \frac{\partial f_2}{\partial x}& ...& \frac{\partial f_n}{\partial x} \end{bmatrix}
?x?f?f??f(x)=?x?f?f??f(x)?=??????x?f1???x?f2??...?x?fn????????Dx?f?f??f(x)=?x?f?f??fT(x)?=[?x?f1????x?f2???...??x?fn???] -
f(x)
?
x
f
(
x
)
=
?
f
T
(
x
)
?
x
=
[
?
f
1
?
x
1
?
f
2
?
x
1
?
f
n
?
x
1
?
f
1
?
x
2
?
f
2
?
x
2
?
f
n
?
x
2
.
.
.
.
.
.
.
.
.
?
f
1
?
x
n
?
f
2
?
x
n
?
f
n
?
x
n
]
D
x
f
(
x
)
=
?
f
(
x
)
?
x
T
=
[
?
f
1
?
x
1
?
f
1
?
x
2
?
f
1
?
x
n
?
f
2
?
x
1
?
f
2
?
x
2
?
f
2
?
x
n
.
.
.
.
.
.
.
.
.
?
f
n
?
x
1
?
f
n
?
x
2
?
f
n
?
x
n
]
\nabla_{\pmb x}\pmb f(\pmb x)=\frac{\partial \pmb f^T(\pmb x)}{\partial \pmb x}=\begin{bmatrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_2}{\partial x_1}&\frac{\partial f_n}{\partial x_1}\\ \frac{\partial f_1}{\partial x_2}&\frac{\partial f_2}{\partial x_2}&\frac{\partial f_n}{\partial x_2}\\ ...&...&...\\ \frac{\partial f_1}{\partial x_n}&\frac{\partial f_2}{\partial x_n}&\frac{\partial f_n}{\partial x_n}\\ \end{bmatrix}\\ D_{\pmb x}\pmb f(\pmb x)=\frac{\partial \pmb f(\pmb x)}{\partial \pmb x^T}=\begin{bmatrix} \frac{\partial f_1}{\partial x_1}&\frac{\partial f_1}{\partial x_2}&\frac{\partial f_1}{\partial x_n}\\ \frac{\partial f_2}{\partial x_1}&\frac{\partial f_2}{\partial x_2}&\frac{\partial f_2}{\partial x_n}\\ ...&...&...\\ \frac{\partial f_n}{\partial x_1}&\frac{\partial f_n}{\partial x_2}&\frac{\partial f_n}{\partial x_n}\\ \end{bmatrix}\\
?xxx?f?f??f(xxx)=?xxx?f?f??fT(xxx)?=???????x1??f1???x2??f1??...?xn??f1????x1??f2???x2??f2??...?xn??f2????x1??fn???x2??fn??...?xn??fn?????????Dxxx?f?f??f(xxx)=?xxxT?f?f??f(xxx)?=???????x1??f1???x1??f2??...?x1??fn????x2??f1???x2??f2??...?x2??fn????xn??f1???xn??f2??...?xn??fn????????? -
f(X)
?
X
f
(
X
)
=
?
f
T
(
X
)
?
v
e
c
X
D
X
f
(
X
)
=
?
f
(
X
)
?
v
e
c
T
X
\nabla_{\pmb X}\pmb f(\pmb X)=\frac{\partial \pmb f^T(\pmb X)}{\partial vec\pmb X}\\ D_{\pmb X}\pmb f(\pmb X)=\frac{\partial \pmb f(\pmb X)}{\partial vec^T\pmb X}\\
?XXX?f?f??f(XXX)=?vecXXX?f?f??fT(XXX)?DXXX?f?f??f(XXX)=?vecTXXX?f?f??f(XXX)? -
F(x)
?
x
F
(
x
)
=
?
F
(
x
)
?
x
=
[
?
f
11
?
x
?
f
12
?
x
.
.
.
?
f
1
n
?
x
?
f
21
?
x
?
f
22
?
x
.
.
.
?
f
2
n
?
x
.
.
.
.
.
.
.
.
.
.
.
.
?
f
n
1
?
x
?
f
n
2
?
x
.
.
.
?
f
m
n
?
x
]
D
x
F
(
x
)
=
?
F
T
(
x
)
?
x
=
[
?
f
11
?
x
?
f
21
?
x
.
.
.
?
f
n
1
?
x
?
f
12
?
x
?
f
22
?
x
.
.
.
?
f
n
2
?
x
.
.
.
.
.
.
.
.
.
.
.
.
?
f
1
n
?
x
?
f
2
n
?
x
.
.
.
?
f
m
n
?
x
]
\nabla_x\pmb F(x)= \frac{\partial \pmb F(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_{11}}{\partial x}&\frac{\partial f_{12}}{\partial x}&...&\frac{\partial f_{1n}}{\partial x}\\ \frac{\partial f_{21}}{\partial x}&\frac{\partial f_{22}}{\partial x}&...&\frac{\partial f_{2n}}{\partial x}\\ ...&...&...&...\\ \frac{\partial f_{n1}}{\partial x}&\frac{\partial f_{n2}}{\partial x}&...&\frac{\partial f_{mn}}{\partial x}\\ \end{bmatrix}\\ D_x\pmb F(x)= \frac{\partial \pmb F^T(x)}{\partial x}= \begin{bmatrix} \frac{\partial f_{11}}{\partial x}&\frac{\partial f_{21}}{\partial x}&...&\frac{\partial f_{n1}}{\partial x}\\ \frac{\partial f_{12}}{\partial x}&\frac{\partial f_{22}}{\partial x}&...&\frac{\partial f_{n2}}{\partial x}\\ ...&...&...&...\\ \frac{\partial f_{1n}}{\partial x}&\frac{\partial f_{2n}}{\partial x}&...&\frac{\partial f_{mn}}{\partial x}\\ \end{bmatrix}
?x?FFF(x)=?x?FFF(x)?=??????x?f11???x?f21??...?x?fn1????x?f12???x?f22??...?x?fn2???............??x?f1n???x?f2n??...?x?fmn????????Dx?FFF(x)=?x?FFFT(x)?=??????x?f11???x?f12??...?x?f1n????x?f21???x?f22??...?x?f2n???............??x?fn1???x?fn2??...?x?fmn???????? -
F(x)
?
X
F
(
x
)
=
?
v
e
c
T
(
F
(
X
)
)
?
x
D
X
F
(
x
)
=
?
v
e
c
(
F
(
X
)
)
?
x
T
\nabla_{\pmb X}\pmb F(\pmb x)=\frac{\partial \pmb vec^T(\pmb F(\pmb X))}{\partial \pmb x}\\ D_{\pmb X}\pmb F(\pmb x)=\frac{\partial \pmb vec(\pmb F(\pmb X))}{\partial \pmb x^T}\\
?XXX?FFF(xxx)=?xxx?vvvecT(FFF(XXX))?DXXX?FFF(xxx)=?xxxT?vvvec(FFF(XXX))? -
F(X)
?
X
F
(
X
)
=
?
v
e
c
T
(
F
(
X
)
)
?
v
e
c
X
D
X
F
(
X
)
=
?
v
e
c
(
F
(
X
)
)
?
v
e
c
T
X
\nabla_{\pmb X}\pmb F(\pmb X)=\frac{\partial \pmb vec^T(\pmb F(\pmb X))}{\partial vec\pmb X}\\ D_{\pmb X}\pmb F(\pmb X)=\frac{\partial \pmb vec(\pmb F(\pmb X))}{\partial vec^T\pmb X}\\
?XXX?FFF(XXX)=?vecXXX?vvvecT(FFF(XXX))?DXXX?FFF(XXX)=?vecTXXX?vvvec(FFF(XXX))?
|