目录
Ch1 多元线性回归
函数模型
函数形式
f
(
x
)
=
θ
0
+
θ
1
x
1
+
?
+
θ
p
x
p
f(x)=\theta_{0}+\theta_{1} x_{1}+\cdots+\theta_{p} x_{p}
f(x)=θ0?+θ1?x1?+?+θp?xp? 向量形式:
通常一个向量指的都是列向量,向量的转置是行向量
f
(
x
)
=
∑
i
=
0
p
θ
i
x
i
=
θ
T
x
=
x
T
θ
=
[
θ
0
θ
1
?
θ
p
]
[
(
x
0
=
1
)
,
x
1
,
x
2
,
…
,
x
p
]
f(x)=\sum_{i=0}^{p} \theta_{i} x_{i}=\boldsymbol{\theta}^{T} x=x^{T} \boldsymbol{\theta} = \left[\begin{array}{c}\theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{p}\end{array}\right]\left[\left(x_{0}=1\right), x_{1}, x_{2}, \ldots, x_{p}\right]
f(x)=i=0∑p?θi?xi?=θTx=xTθ=??????θ0?θ1??θp????????[(x0?=1),x1?,x2?,…,xp?] 损失函数:最小均方误差MSE:
J
(
θ
)
=
1
2
∑
i
=
1
n
(
x
i
T
θ
?
y
i
)
2
J(\theta)=\frac{1}{2} \sum_{i=1}^{n}\left(x_{i}^{T} \theta-y_{i}\right)^{2}
J(θ)=21?i=1∑n?(xiT?θ?yi?)2 线性回归模型:求解损失函数的最小值
θ
?
=
a
r
g
m
i
n
J
(
θ
)
\theta^* = arg minJ(\theta)
θ?=argminJ(θ)
加入数据后的模型
n组数据
预测值:
Y
^
=
X
θ
=
[
X
1
T
θ
X
2
T
θ
…
X
n
T
θ
]
=
[
X
11
?
X
12
…
X
1
p
X
21
?
X
22
…
X
2
p
…
X
n
1
?
X
n
2
…
X
n
p
]
[
θ
0
θ
1
?
θ
p
]
\hat Y = X\theta=\left[\begin{array}{l} X_1^T\theta \\X_2^T\theta \\ \ldots \\X_n^T\theta \\ \end{array}\right]=\left[\begin{array}{l} X_{11}\space X_{12}\ldots X_{1p}\\X_{21}\space X_{22}\ldots X_{2p} \\ \ldots \\X_{n1}\space X_{n2}\ldots X_{np} \\\end{array}\right]\left[\begin{array}{c}\theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{p}\end{array}\right]
Y^=Xθ=?????X1T?θX2T?θ…XnT?θ??????=?????X11??X12?…X1p?X21??X22?…X2p?…Xn1??Xn2?…Xnp?????????????θ0?θ1??θp???????? 实际值label (n组数据n个label):
Y
=
[
y
1
y
2
?
y
n
]
Y =\left[\begin{array}{c}y_1 \\ y_2\\ \vdots \\ y_n\end{array}\right]
Y=??????y1?y2??yn????????
模型求解
梯度下降法
Gradient Decent
θ
:
=
θ
?
α
?
θ
J
(
θ
)
\theta:=\theta-\alpha \nabla_{\theta} J(\theta)
θ:=θ?α?θ?J(θ)
J
(
θ
)
=
1
2
∑
i
=
1
n
(
x
i
T
θ
?
y
i
)
2
J(\theta)=\frac{1}{2} \sum_{i=1}^{n}\left(x_{i}^{T} \theta-y_{i}\right)^{2}
J(θ)=21?i=1∑n?(xiT?θ?yi?)2
其中算子:梯度是偏导数的自然扩展
?
θ
J
=
[
?
J
?
θ
0
?
?
?
J
?
θ
p
]
\nabla_{\theta} J=\left[\begin{array}{l}\frac{\partial J}{\partial \theta_{0}} \\ \cdots \\ \cdots \\ \frac{\partial J}{\partial \theta_{p}}\end{array}\right]
?θ?J=??????θ0??J????θp??J??????? 求损失函数的偏导:
?
1
θ
j
2
(
x
i
T
θ
?
y
i
)
2
=
?
1
θ
j
2
(
∑
j
=
0
p
x
i
,
j
θ
j
?
y
i
)
2
x
i
=
(
x
i
,
0
,
…
,
x
i
,
p
)
T
=
(
∑
j
=
0
p
x
i
,
j
θ
j
?
y
i
)
?
θ
j
(
∑
j
=
0
p
x
i
,
j
θ
j
?
y
i
)
=
(
f
(
x
i
)
?
y
i
)
x
i
,
j
\begin{array}{l}\frac{\partial 1}{\theta_{j} 2}\left(x_{i}^{T} \theta-y_{i}\right)^{2} \\ =\frac{\partial 1}{\theta_{j} 2}\left(\sum_{j=0}^{p} x_{i, j} \theta_{j}-y_{i}\right)^{2} \quad x_{i}=\left(x_{i, 0}, \ldots, x_{i, p}\right)^{T} \\ =\left(\sum_{j=0}^{p} x_{i, j} \theta_{j}-y_{i}\right) \frac{\partial}{\theta_{j}}\left(\sum_{j=0}^{p} x_{i, j} \theta_{j}-y_{i}\right) \\ =\left(f\left(x_{i}\right)-y_{i}\right) x_{i, j}\end{array}
θj?2?1?(xiT?θ?yi?)2=θj?2?1?(∑j=0p?xi,j?θj??yi?)2xi?=(xi,0?,…,xi,p?)T=(∑j=0p?xi,j?θj??yi?)θj???(∑j=0p?xi,j?θj??yi?)=(f(xi?)?yi?)xi,j??
正规方程法
J
(
θ
)
=
1
2
∥
Y
?
X
θ
∥
2
=
1
2
(
X
θ
?
Y
)
T
(
X
θ
?
Y
)
=
1
2
(
θ
T
X
T
X
θ
?
2
Y
T
X
θ
+
Y
T
Y
)
\begin{aligned} J(\theta) &=\frac{1}{2}\|Y-X \theta\|^{2} \\ &=\frac{1}{2}(X \theta-Y)^{T}(X \theta-Y) \\ &=\frac{1}{2}\left(\theta^{T} X^{T} X \theta-2 Y^{T} X \theta+Y^{T} Y\right) \end{aligned}
J(θ)?=21?∥Y?Xθ∥2=21?(Xθ?Y)T(Xθ?Y)=21?(θTXTXθ?2YTXθ+YTY)?
注解:
?
x
T
B
x
?
x
=
(
B
+
B
T
)
x
?
x
T
a
?
x
=
?
a
T
x
?
x
=
?a?
\begin{array}{l}\frac{\partial \mathbf{x}^{T} \mathbf{B} \mathbf{x}}{\partial \mathbf{x}}=\left(\mathbf{B}+\mathbf{B}^{T}\right) \mathbf{x} \\ \frac{\partial \mathbf{x}^{T} \mathbf{a}}{\partial \mathbf{x}}=\frac{\partial \mathbf{a}^{T} \mathbf{x}}{\partial \mathrm{x}}=\text { a }\\\end{array}
?x?xTBx?=(B+BT)x?x?xTa?=?x?aTx?=?a?? 我们令
B
=
X
T
X
,
B
T
=
B
?
(
B
+
B
B
)
θ
=
2
B
θ
B=X^TX,B^T=B\Longrightarrow (B+B^B)\theta = 2B\theta
B=XTX,BT=B?(B+BB)θ=2Bθ
?
θ
J
(
θ
)
=
?
J
(
θ
)
?
θ
=
1
2
(
θ
T
X
T
X
θ
?
2
Y
T
X
θ
+
Y
T
Y
)
?
θ
=
X
T
X
θ
?
(
Y
T
X
)
T
=
X
T
X
θ
?
X
T
Y
=
0
?
X
T
X
θ
=
X
T
Y
θ
?
=
(
X
T
X
)
?
1
X
T
?
θ
?
=
(
X
T
X
)
?
1
X
T
Y
\nabla_{\theta} J(\theta)=\frac{\partial J(\theta)}{\partial \theta}=\frac{\frac{1}{2}\left(\theta^{T} X^{T} X \theta-2 Y^{T} X \theta+Y^{T} Y\right)}{\partial \theta}=X^{T} X \theta-\left(Y^{T} X\right)^{T}=X^{T} X \theta-X^{T} Y=0\\\Longrightarrow X^{T} X \theta=X^{T} Y\theta^{*}=\left(X^{T} X\right)^{-1} X^{T}\\\Longrightarrow\theta^{*}=\left(X^{T} X\right)^{-1} X^{T} Y
?θ?J(θ)=?θ?J(θ)?=?θ21?(θTXTXθ?2YTXθ+YTY)?=XTXθ?(YTX)T=XTXθ?XTY=0?XTXθ=XTYθ?=(XTX)?1XT?θ?=(XTX)?1XTY
随机梯度下降法
Mini-batch GD
每次只 用训练集中的一个数据,把数据分为若干个批,按批来更新参 数。一个批中的一组数据共同决定了本次梯度的方向,下降起 来就不容易跑偏,减少了随机性。
一个bacth 形成一个epoch分批次训练
全局最优解
当
J
(
θ
)
J(\theta)
J(θ)是凸函数(凹函数和凸函数统称凸函数)时,二阶导数大于0,
X
T
X
X^TX
XTX为半正定矩阵
?
θ
2
J
(
θ
)
=
X
T
X
\nabla_{\theta}^{2} J(\theta)=X^{T} X
?θ2?J(θ)=XTX 当训练样本的数目n大于训练样本的维度(p+1 个属性,特征)
X
T
X
X^TX
XTX通常可逆,表明改矩阵事正定矩阵,求的参数是全局最优解。不可逆时,可以接出多个参数解。可使用 正则化给出一个“归纳偏好”解。
评估方法
留出法
随机挑选 一部分标 记数据作 为测试集 (空心点 ),其余的作 为训练集 (实心点 ),计算 回归模型,使用测试 集对模型 评估: MSE =2.4,测试集不能太大,也不 能太小。2 <= n:m <=4
交叉验证法
十折交叉验证,如将数据集分为10份,每次选一份作为测试集,其余作为训练集。
性能度量
线性回归模型:平方和误差
在测试集上报告 MSE(mean square error) 误差
J
train?
(
θ
)
=
1
2
∑
i
=
1
n
(
x
i
T
θ
?
y
i
)
2
J_{\text {train }}(\theta)=\frac{1}{2} \sum_{i=1}^{n}\left(\mathbf{x}_{i}^{T} \theta-y_{i}\right)^{2}
Jtrain??(θ)=21?i=1∑n?(xiT?θ?yi?)2
θ
?
=
argmin
?
J
train?
(
θ
)
=
(
X
train?
T
X
train?
)
?
1
X
train?
T
y
?
train?
\theta^{*}=\operatorname{argmin} J_{\text {train }}(\theta)=\left(X_{\text {train }}^{T} X_{\text {train }}\right)^{-1} X_{\text {train }}^{T} \vec{y}_{\text {train }}
θ?=argminJtrain??(θ)=(Xtrain?T?Xtrain??)?1Xtrain?T?y
?train??
J
test?
=
1
m
∑
i
=
n
+
1
n
+
m
(
x
i
T
θ
?
?
y
i
)
2
=
1
m
∑
i
=
n
+
1
n
+
m
ε
i
2
J_{\text {test }}=\frac{1}{m} \sum_{i=n+1}^{n+m}\left(\mathbf{x}_{i}^{T} \theta^{*}-y_{i}\right)^{2}=\frac{1}{m} \sum_{i=n+1}^{n+m} \varepsilon_{i}^{2}
Jtest??=m1?i=n+1∑n+m?(xiT?θ??yi?)2=m1?i=n+1∑n+m?εi2?
分类任务:错误率与精度
错误率是分类错误的样本数占样本总数的比例
精度是分类正确的样本数占样本总数的比例
对二分类问题:
查准率:
P
=
T
P
T
P
+
F
P
P=\frac{T P}{T P+F P}
P=TP+FPTP?
查全率:
R
=
T
P
T
P
+
F
N
R=\frac{T P}{T P+F N}
R=TP+FNTP?
F1:
F
1
=
2
×
P
×
R
P
+
R
=
2
×
T
P
?样例总数?
+
T
P
?
T
N
F 1=\frac{2 \times P \times R}{P+R}=\frac{2 \times T P}{\text { 样例总数 }+T P-T N}
F1=P+R2×P×R?=?样例总数?+TP?TN2×TP?
|