线性回归
模型定义
我们通过一个简单的案例,来了解下线性回归
y
^
=
x
1
w
1
+
x
2
w
2
+
b
\hat y = x_1 w_1 +x_2 w_2 +b
y^?=x1?w1?+x2?w2?+b 其中
w
1
和
w
2
是
权
重
w_1和w_2是权重
w1?和w2?是权重,b是偏差,
y
^
\hat y
y^?是预测值,y的预测估计,我们进行模型评估
训练
损失函数
l
(
w
1
,
w
2
,
b
)
=
1
N
∑
i
=
1
n
1
2
(
x
i
i
w
1
+
x
2
i
w
2
+
b
?
y
i
)
2
l(w_1,w_2,b)=\frac{1}{N} \sum_{i=1}^{n} \frac{1}{2}(x_i^{i}w_1+x_2^{i}w_2+b-y^{i})^2
l(w1?,w2?,b)=N1?i=1∑n?21?(xii?w1?+x2i?w2?+b?yi)2 我们找到一组数据,目标是:
w
1
?
,
w
2
?
,
b
?
=
a
r
g
m
i
n
w
1
,
w
2
,
b
l
(
w
1
,
w
2
,
b
)
w_1^*,w_2^*,b^*=argmin_{w_1,w_2,b} l(w_1,w_2,b)
w1??,w2??,b?=argminw1?,w2?,b?l(w1?,w2?,b)
优化算法
w
1
←
w
1
?
ξ
∣
B
∣
∑
i
∈
B
?
l
{
i
}
(
w
1
,
w
2
,
b
)
?
w
1
w_1 \leftarrow w_1-\frac{\xi}{|B|}\sum_{i\in B} \frac{\partial l^{\{i\}}(w_1,w_2,b)}{\partial w_1}
w1?←w1??∣B∣ξ?i∈B∑??w1??l{i}(w1?,w2?,b)?
w
2
←
w
2
?
ξ
∣
B
∣
∑
i
∈
B
?
l
{
i
}
(
w
1
,
w
2
,
b
)
?
w
2
w_2 \leftarrow w_2-\frac{\xi}{|B|}\sum_{i\in B} \frac{\partial l^{\{i\}}(w_1,w_2,b)}{\partial w_2}
w2?←w2??∣B∣ξ?i∈B∑??w2??l{i}(w1?,w2?,b)?
b
←
b
?
ξ
∣
B
∣
∑
i
∈
B
?
l
{
i
}
(
w
1
,
w
2
,
b
)
?
b
b \leftarrow b-\frac{\xi}{|B|}\sum_{i\in B} \frac{\partial l^{\{i\}}(w_1,w_2,b)}{\partial b}
b←b?∣B∣ξ?i∈B∑??b?l{i}(w1?,w2?,b)?
线性回归表示
我们先试试最简单的模型: 输入1:
x
1
x_1
x1? 输入2:
x
2
x_2
x2? 输出:
o
o
o
拟合对象矢量形式
y
^
=
X
?
w
?
+
b
\hat y=\vec X \vec w+b
y^?=X
w
+b
- 对应的损失函数
l
(
θ
)
=
1
2
n
(
y
^
?
y
)
T
(
y
^
?
y
)
l(\theta)=\frac{1}{2n}(\hat y-y)^T(\hat y-y)
l(θ)=2n1?(y^??y)T(y^??y) - 小批量下降法
θ
?
←
θ
?
?
ξ
∣
B
∣
?
θ
l
{
i
}
(
θ
?
)
\vec \theta \leftarrow \vec \theta-\frac{\xi}{|B|}\nabla_{\theta}l^{\{i\}}(\vec \theta)
θ
←θ
?∣B∣ξ??θ?l{i}(θ
)
实现
生成数据集
y
=
X
?
w
?
+
b
+
?
y=\vec X \vec w+b+\epsilon
y=X
w
+b+?
其中
?
\epsilon
?是误差项
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random
num_inputs = 2
num_examples = 1000
true_w = [2, -3.4]
true_b = 4.2
features = torch.from_numpy(np.random.normal(0, 1, (num_examples,
num_inputs)))
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] +
true_b
labels += torch.from_numpy(np.random.normal(0, 0.01,
size=labels.size()))
print(features[0], labels[0])
def set_figsize(figsize=(3.5, 2.5)):
plt.rcParams['figure.figsize'] = figsize
set_figsize()
plt.scatter(features[:,1].numpy(),labels.numpy(),1);
小批量读取数据
def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices)
for i in range(0, num_examples, batch_size):
j = torch.LongTensor(indices[i: min(i + batch_size,num_examples)])
yield features.index_select(0, j), labels.index_select(0,j)
batch_size=10
for X,y in data_iter(batch_size,features,labels):
print(X,y)
break
初始化一些模型参数
w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)),dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)
w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)
def linreg(X, w, b):
return torch.mm(X, w) + b
损失函数
def squared_loss(y_hat, y):
return (y_hat - y.view(y_hat.size())) ** 2 / 2
优化算法
def sgd(params, lr, batch_size):
for param in params:
param.data -= lr * param.grad / batch_size
训练模型
lr = 0.03
num_epochs = 3
net = linreg
loss = squared_loss
for epoch in range(num_epochs):
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y).sum()
l.backward()
sgd([w, b], lr, batch_size)
w.grad.data.zero_()
b.grad.data.zero_()
train_l = loss(net(features, w, b), labels)
print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))
print(true_w, '\n', w)
print(true_b, '\n', b)
|