【参考:scikit-learn中文社区】 学习原理,学习案例,学习API参数
本文参考(从零开始)《机器学习的数学原理和算法实践》
这篇文章也不错【参考:机器学习 | 算法笔记- 线性回归(Linear Regression) - eo_will - 博客园】
【参考:公式里面的arg是什么意思?_一个做图像文本的深度学习人-CSDN博客】
术语
MAE(mean square error) 均方误差 又称 残差
原理
基本形式
f
(
x
)
=
w
1
x
1
+
w
2
x
2
+
.
.
.
+
w
d
x
d
+
b
f(x)=w_1 x_1+w_2 x_2+...+w_d x_d+b
f(x)=w1?x1?+w2?x2?+...+wd?xd?+b
向量形式
f
(
x
)
=
ω
T
x
+
b
f(x)=\pmb{\omega}^Tx+b
f(x)=ωωωTx+b
其中
ω
=
(
w
1
,
w
2
,
.
.
.
w
d
)
\pmb{\omega}=(w_1,w_2,...w_d)
ωωω=(w1?,w2?,...wd?)
目标:寻找最佳的
ω
\pmb{\omega}
ωωω和b
过拟合
正则化:给原来的损失函数增加惩罚项
实践
波士顿房价预测
from sklearn.datasets import load_boston
boston_house=load_boston()
x=boston_house.data
y=boston_house.target
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test =
train_test_split(x,y,random_state=33,test_size=0.25)
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
lr.fit(x_train,y_train)
lr_y_predict=lr.predict(x_test)
from sklearn.metrics import mean_squared_error
print("MSE:",mean_squared_error(y_test,lr_y_predict))
pycharm jupyter 输出
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
MSE: 25.096985692067754
官方
【参考:sklearn.linear_model.LinearRegression-scikit-learn中文社区】
- 有很多例子
>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>>
>>> y = np.dot(X, np.array([1, 2])) + 3
>>> reg = LinearRegression().fit(X, y)
>>> reg.score(X, y)
1.0
>>> reg.coef_
array([1., 2.])
>>> reg.intercept_
3.0000...
>>> reg.predict(np.array([[3, 5]]))
array([16.])
【参考:线性回归实例-scikit-learn中文社区】
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
diabetes_X = diabetes_X[:, np.newaxis, 2]
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]
regr = linear_model.LinearRegression()
regr.fit(diabetes_X_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_X_test)
print('Coefficients: \n', regr.coef_)
print('Mean squared error: %.2f'
% mean_squared_error(diabetes_y_test, diabetes_y_pred))
print('Coefficient of determination: %.2f'
% r2_score(diabetes_y_test, diabetes_y_pred))
plt.scatter(diabetes_X_test, diabetes_y_test, color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
Coefficients:
[938.23786125]
Mean squared error: 2548.07
Coefficient of determination: 0.47
|