逻辑回归的梯度下降公式
逻辑回归的梯度下降公式:
θ
j
:
=
θ
j
?
α
1
m
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
?
y
(
i
)
)
x
j
(
i
)
\theta_{j}:=\theta_{j}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}
θj?:=θj??αm1?i=1∑m?(hθ?(x(i))?y(i))xj(i)?
其中:
h
θ
(
x
(
i
)
)
=
g
(
θ
T
x
(
i
)
)
=
1
1
+
e
?
θ
T
x
(
i
)
h_{\theta}(x^{(i)})=g\left(\theta^T x^{(i)}\right)=\frac{1}{1+e^{-\theta^{T} x^{(i)}}}
hθ?(x(i))=g(θTx(i))=1+e?θTx(i)1?
向量化后的公式为:
θ
:
=
θ
?
α
m
X
T
(
g
(
X
θ
)
?
y
?
)
\theta:=\theta-\frac{\alpha}{m} X^{T}(g(X \theta)-\vec{y})
θ:=θ?mα?XT(g(Xθ)?y
?)
其中:
y
?
=
(
y
(
1
)
y
(
2
)
?
y
(
m
)
)
???????
θ
=
(
θ
0
θ
1
?
θ
n
)
??????
X
=
[
x
0
(
1
)
x
1
(
1
)
?
x
n
(
1
)
x
0
(
2
)
x
1
(
2
)
?
x
n
(
2
)
?
?
x
0
(
m
)
x
1
(
m
)
?
x
n
(
m
)
]
m
×
(
n
+
1
)
\vec{y}=\left(\begin{array}{c} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \end{array}\right)~~~~~~~\theta=\left(\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{array}\right)~~~~~~X=\left[\begin{array}{cccc} x_{0}^{(1)} & x_{1}^{(1)} & \cdots & x_{n}^{(1)} \\ x_{0}^{(2)} & x_{1}^{(2)} & \cdots & x_{n}^{(2)} \\ \vdots & & &\vdots\\ x_{0}^{(m)} & x_{1}^{(m)} & \cdots & x_{n}^{(m)} \end{array}\right]_{m \times(n+1)}
y
?=??????y(1)y(2)?y(m)??????????????θ=??????θ0?θ1??θn??????????????X=???????x0(1)?x0(2)??x0(m)??x1(1)?x1(2)?x1(m)??????xn(1)?xn(2)??xn(m)?????????m×(n+1)?
X
θ
=
[
θ
0
x
0
(
1
)
+
θ
1
x
1
(
1
)
+
θ
2
x
2
(
1
)
+
?
+
θ
n
x
n
(
1
)
θ
0
x
0
(
2
)
+
θ
1
x
2
(
2
)
+
θ
2
x
2
(
2
)
+
?
+
θ
n
x
n
(
2
)
?
θ
0
x
0
(
m
)
+
θ
1
x
1
(
m
)
+
θ
2
x
2
(
m
)
+
?
+
θ
n
x
n
(
m
)
]
???????????????????
g
(
X
θ
)
=
[
h
θ
(
x
(
1
)
)
h
θ
(
x
(
2
)
)
?
h
θ
(
x
(
m
)
)
]
X \theta=\left[\begin{array}{c} \theta_{0} x_{0}^{(1)}+\theta_{1} x_{1}^{(1)}+\theta_{2} x_{2}^{(1)}+\cdots+\theta_{n} x_{n}{ }^{(1)} \\ \theta_{0} x_{0}^{(2)}+\theta_{1} x_{2}^{(2)}+\theta_{2} x_{2}^{(2)}+\cdots+\theta_{n} x_{n}^{(2)} \\ \cdots \\ \theta_{0} x_{0}^{(m)}+\theta_{1} x_{1}^{(m)}+\theta_{2} x_{2}^{(m)}+\cdots+\theta_{n} x_{n}^{(m)} \end{array}\right]~~~~~~~~~~~~~~~~~~~ g(X \theta)=\left[\begin{array}{c} h_{\theta}\left(x^{(1)}\right) \\ h_{\theta}\left(x^{(2)}\right) \\ \cdots\\ h_\theta\left(x^{(m)}\right) \end{array}\right]
Xθ=??????θ0?x0(1)?+θ1?x1(1)?+θ2?x2(1)?+?+θn?xn?(1)θ0?x0(2)?+θ1?x2(2)?+θ2?x2(2)?+?+θn?xn(2)??θ0?x0(m)?+θ1?x1(m)?+θ2?x2(m)?+?+θn?xn(m)???????????????????????????g(Xθ)=?????hθ?(x(1))hθ?(x(2))?hθ?(x(m))??????
详细向量化过程
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
?
y
(
i
)
)
x
j
(
i
)
=
[
h
θ
(
x
(
1
)
)
?
y
(
1
)
]
x
j
(
1
)
+
[
h
θ
(
x
(
2
)
)
?
y
(
2
)
]
x
j
(
2
)
+
?
+
[
h
θ
(
x
(
m
)
)
?
y
(
m
)
]
x
j
(
m
)
=
(
x
j
(
1
)
,
x
j
(
2
)
,
?
?
,
x
j
(
m
)
)
?
(
h
θ
(
x
(
1
)
)
?
y
(
1
)
h
θ
(
x
(
2
)
)
?
y
(
2
)
?
h
θ
(
x
(
m
)
)
?
y
(
m
)
)
=
(
x
j
(
1
)
,
x
j
(
2
)
,
?
?
,
x
j
(
m
)
)
?
[
(
h
θ
(
x
(
1
)
)
h
θ
(
x
(
2
)
)
?
h
θ
(
x
(
m
)
)
)
?
(
y
(
1
)
y
(
2
)
?
y
(
m
)
)
]
=
x
j
?
[
g
(
X
θ
)
?
y
?
]
\begin{aligned} &\sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \\\\ =&{\left[h_{\theta}\left(x^{(1)}\right)-y^{(1)}\right]x_{j}^{(1)}+\left[h_{\theta}\left(x^{(2)}\right)-y^{(2)}\right] x_{j}^{(2)}} +\cdots+\left[h_{\theta}\left(x^{(m)}\right)-y^{(m)}\right] x_{j}^{(m)} \\\\ = &\left(x_{j}^{(1)}, x_{j}^{(2)}, \cdots, x_{j}^{(m)}\right) \cdot\left(\begin{array}{c} h_{\theta}\left(x^{(1)}\right)-y^{(1)} \\ h_{\theta}\left(x^{(2)}\right)-y^{(2)} \\ \vdots \\ h_{\theta}\left(x^{(m)}\right)-y^{(m)} \end{array}\right) \\\\ =& \left(x_{j}^{(1)}, x_{j}^{(2)}, \cdots, x_{j}^{(m)}\right)\cdot\left[\left(\begin{array}{c} h_{\theta}\left(x^{(1)}\right) \\ h_{\theta}\left(x^{(2)}\right) \\ \vdots \\ h_{\theta}\left(x^{(m)}\right) \end{array}\right)-\left(\begin{array}{c} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \end{array}\right)\right] \\\\ =& x_{j} \cdot[g(X \theta)-\vec{y}] \end{aligned}
====?i=1∑m?(hθ?(x(i))?y(i))xj(i)?[hθ?(x(1))?y(1)]xj(1)?+[hθ?(x(2))?y(2)]xj(2)?+?+[hθ?(x(m))?y(m)]xj(m)?(xj(1)?,xj(2)?,?,xj(m)?)???????hθ?(x(1))?y(1)hθ?(x(2))?y(2)?hθ?(x(m))?y(m)???????(xj(1)?,xj(2)?,?,xj(m)?)?????????????hθ?(x(1))hθ?(x(2))?hθ?(x(m))??????????????y(1)y(2)?y(m)?????????????xj??[g(Xθ)?y
?]?
则:
θ
j
:
=
θ
j
?
α
m
x
j
[
g
(
X
θ
)
?
y
?
]
\theta_{j}:=\theta_{j}-\frac{\alpha}{m}x_{j}[g(X \theta)-\vec{y}]
θj?:=θj??mα?xj?[g(Xθ)?y
?]
[
θ
0
θ
1
?
θ
n
]
:
=
[
θ
0
θ
1
?
θ
n
]
?
α
m
[
x
0
x
1
?
x
n
]
[
g
(
X
θ
)
?
y
?
]
\left[\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{array}\right]:=\left[\begin{array}{c} \theta_{0} \\ \theta_{1} \\ \vdots \\ \theta_{n} \end{array}\right]-\frac{\alpha}{m}\left[\begin{array}{c} x_{0} \\ x_{1} \\ \vdots \\ x_{n} \end{array}\right]\left[g\left(X\theta\right)-\vec{y}\right]
??????θ0?θ1??θn????????:=??????θ0?θ1??θn?????????mα???????x0?x1??xn????????[g(Xθ)?y
?]
最终得:
θ
:
=
θ
?
α
m
X
T
(
g
(
X
θ
)
?
y
?
)
\theta:=\theta-\frac{\alpha}{m} X^{T}(g(X \theta)-\vec{y})
θ:=θ?mα?XT(g(Xθ)?y
?)
|