1、模型?
???????,xi为样本点,为模型参数,???????为模型预测值,yi为真实值,为真实值和预测值的差值
2、目标函数
线性回归假设条件:条件一:,条件二:独立,根据假设条件和目标函数,推导得出目标函数为 最小二乘
3、模型求解
梯度下降法:
- 随机初始化参数值
- 沿负梯度方向迭代,更新后的使目标函数值更小?
4、模型调参
通过L1正则(ridge回归)/L2正则(lasso回归)/弹性网 优化模型参数,防止参数过拟合
?将样本分为训练、验证和测试集,通过交叉验证选择最优超参数,即L1、L2正则中的,弹性网中的?,在测试集中评估模型效果
5、代码
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.8, random_state=1)#构建样本集和测试集
model = linreg.fit(x_train, y_train)#模型训练
print(linreg.coef_, linreg.intercept_)#打印模型训练参数
y_hat = linreg.predict(x_test)#模型预测
mse = np.average((y_hat - np.array(y_test)) ** 2) #模型效果评估 Mean Squared Error
- L1正则(ridge回归)/L2正则(lasso回归)
-
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.8)
model = Lasso() #模型构建
#model = Ridge()
alpha_can = np.logspace(-2, 2, 10) #设置超参数
lasso_model = GridSearchCV(model, param_grid={'alpha': alpha_can}, cv=5) #五折交叉验证选择最优超参数
lasso_model.fit(x_train, y_train) #模型训练
print('超参数:\n', lasso_model.best_params_)
y_hat = lasso_model.predict(x_test) #模型预测
mse = np.average((y_hat - np.array(y_test)) ** 2) # Mean Squared Error
rmse = np.sqrt(mse) # Root Mean Squared Error
print(mse, rmse)
|