梯度下降算法
??穷举法和观察法不可行,因为w的数量如果过大,将会大大增加时间开销且可能找到局部最优解 ??梯度下降:(能够找到局部最优解,但也许找不到全局最优解(没有任何一个局部最优比他好)) ??那么为什么深度学习中大多数还是使用梯度下降来寻找最优解,因为在很多学习中得到结果,深度学习最优化问题中并不存在很多局部最优解。 但存在一个点(鞍点,偏导为0) ??
y
=
w
x
+
b
y=wx+b
y=wx+b ??MSE:
c
o
s
t
(
w
)
=
1
N
∑
n
=
1
N
(
y
^
n
?
y
n
)
2
cost(w)=\frac{1}{N}\sum_{n=1}^{N}{(\hat{y}_n-y_n)^2}
cost(w)=N1?n=1∑N?(y^?n??yn?)2 ??优化目标:
w
?
=
arg?min
?
w
c
o
s
t
(
w
)
w^*=\argmin_{w}{cost(w)}
w?=wargmin?cost(w) ??求导数:
?
c
o
s
t
?
w
\frac{\partial cost}{\partial w}
?w?cost? ??update:
w
=
w
?
α
?
c
o
s
t
?
w
,
w=w-\alpha \frac{\partial cost}{\partial w},
w=w?α?w?cost?,
α
是
学
习
率
(
步
长
)
\alpha 是学习率(步长)
α是学习率(步长) ??然后就是迭代就最优的过程:
?
c
o
s
t
?
w
=
?
1
N
∑
n
=
1
N
(
x
n
?
w
?
y
n
)
2
?
w
\frac{\partial cost}{\partial w}=\frac{\partial \frac{1}{N} \sum_{n=1}^{N}(x_n\cdot w-y_n)^2}{ \partial w}
?w?cost?=?w?N1?∑n=1N?(xn??w?yn?)2?
=
1
N
∑
n
=
1
N
?
(
x
n
?
w
?
y
n
)
2
?
w
=\frac{1}{N}\sum_{n=1}^{N}{\frac{\partial (x_n\cdot w-y_n)^2}{\partial w}}
=N1?n=1∑N??w?(xn??w?yn?)2?
=
1
N
∑
n
=
1
N
2
?
(
x
n
?
w
?
y
n
)
?
(
x
n
?
w
?
y
n
)
?
w
=\frac{1}{N}\sum_{n=1}^{N}{2\cdot (x_n\cdot w-y_n)\frac{\partial (x_n\cdot w-y_n)}{\partial w}}
=N1?n=1∑N?2?(xn??w?yn?)?w?(xn??w?yn?)?
=
1
N
∑
n
=
1
N
2
?
x
n
?
(
x
n
?
w
?
y
n
)
=\frac{1}{N}\sum_{n=1}^{N}{2\cdot x_n\cdot (x_n\cdot w-y_n)}
=N1?n=1∑N?2?xn??(xn??w?yn?) Update:
w
=
w
?
α
1
N
∑
n
=
1
N
2
?
x
n
?
(
x
n
?
w
?
y
n
)
w=w-\alpha\frac{1}{N}\sum_{n=1}^{N}{2\cdot x_n\cdot (x_n\cdot w-y_n)}
w=w?αN1?n=1∑N?2?xn??(xn??w?yn?) ?? ?? ??代码:
import random
import matplotlib.pyplot as plt
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = random.random()
def forward(x):
"""
定义的线性模型
"""
return x * w
def cost(x, y):
"""
损失函数
"""
cost = 0
for x_i, y_i in zip(x, y):
y_pred = forward(x_i)
cost += (y_pred - y_i) ** 2
return cost / (len(x))
def gradient(x, y):
grad = 0
for x_i, y_i in zip(x, y):
grad += 2 * x_i * (x_i * w - y_i)
return grad / (len(x))
epoch_list = []
cost_list = []
for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
epoch_list.append(epoch)
cost_list.append(cost_val)
print("epoch:", epoch, "w:", w, "loss(MSE):", cost_val)
print("Predict:", forward(4))
plt.plot(epoch_list, cost_list)
plt.ylabel('Loss')
plt.xlabel('w')
plt.show()
??误差随着训练轮次的增加逐渐减少
?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
|