[数据结构与算法] sklearn实现线性模型

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 数据结构与算法 -> sklearn实现线性模型 -> 正文阅读

[数据结构与算法]sklearn实现线性模型

用sklearn实现各种线性回归模型，训练数据为房价预测数据，数据文件见https://download.csdn.net/download/d1240673769/20910882

加载房价预测数据

import pandas as pd 
df = pd.read_csv('sample_data_sets.csv')  # 房价预测数据 
print(df.columns)
df.head()

在这里插入图片描述

制作标签变量

# 制作标签变量
price_median = df['average_price'].median()   # 房价中位数
print(price_median)

#  定义一个是否是高房价标签（大于中位数为高房价）
df['is_high'] = df['average_price'].map(lambda x: True if x>= price_median else False)
print(df['is_high'].value_counts())

在这里插入图片描述

提取自变量和因变量

# 提取自变量
x_train = df.copy()[['area', 'daypop', 'nightpop',
       'night20-39', 'sub_kde', 'bus_kde', 'kind_kde']]
# 提取因变量：数值型
y_train = df.copy()['average_price']
# 提取因变量：类别型
y_label = df.copy()['is_high']

线性回归模型

# 加载pipeline
from sklearn.pipeline import Pipeline

# 加载线性回归模型
from sklearn.linear_model import LinearRegression

# 构建线性回归模型
pipe_lm = Pipeline([
        ('lm_regr',LinearRegression(fit_intercept=True))   # fit_intercept=True 表示拟合截距值
        ])

# 训练线性回归模型
pipe_lm.fit(x_train, y_train)

# 使用线性回归模型进行预测
y_train_predict = pipe_lm.predict(x_train)

# 查看线性回归模型特征参数
print(pipe_lm.named_steps['lm_regr'].coef_)

# 查看线性回归模型截距值
print(pipe_lm.named_steps['lm_regr'].intercept_)

在这里插入图片描述

# 提取模型特征参数
coef = pipe_lm.named_steps['lm_regr'].coef_

# 提取对应的特征名称
features = x_train.columns.tolist()

# 构建参数df
coef_table = pd.DataFrame({'feature': features, 'coefficient': coef})
print(coef_table)

在这里插入图片描述

# 加载画图组件matplotlib
import matplotlib.pyplot as plt
# 绘制参数特征值柱状图
coef_table.set_index(['feature']).plot.barh()
# 设置x等于0的参考线
plt.axvline(0, color='k')
# 显示图表
plt.show()

在这里插入图片描述

# 绘制参数特征值柱状图
coef_table.set_index(['feature']).iloc[0:4].plot.barh()
# 设置x等于0的参考线
plt.axvline(0, color='k')
# 显示图表
plt.show()

在这里插入图片描述

lasso线性回归模型

# 加载lasso回归模型
from sklearn.linear_model import Lasso

# 构建线性回归模型
pipe_lasso = Pipeline([
        ('lasso_regr',Lasso(alpha=500, fit_intercept=True))  # alpha控制L1正则系数的约束值
        ])

# 训练线性回归模型
pipe_lasso.fit(x_train, y_train)
# 使用线性回归模型进行预测
y_train_predict = pipe_lasso.predict(x_train)

# 查看线性回归模型特征参数
print(pipe_lasso.named_steps['lasso_regr'].coef_)

# 查看线性回归模型截距值
print(pipe_lasso.named_steps['lasso_regr'].intercept_)

在这里插入图片描述
如上图：后三个特征值的参数被约束到0

# 提取模型特征参数
coef = pipe_lasso.named_steps['lasso_regr'].coef_

# 提取对应的特征名称
features = x_train.columns.tolist()

# 构建参数df
coef_table = pd.DataFrame({'feature': features, 'coefficient': coef})
print(coef_table)

在这里插入图片描述

coef_table.set_index(['feature']).plot.barh()
# 设置x等于0的参考线
plt.axvline(0, color='k')
# 显示图表
plt.show()

在这里插入图片描述

ridge线性回归模型

# 加载ridge回归模型
from sklearn.linear_model import Ridge

# 构建ridge回归模型
pipe_ridge = Pipeline([
        ('ridge_regr',Ridge(alpha=500, fit_intercept=True, solver = 'lsqr'))  # solver为求解器，lsqr表示最小二乘法
        ])

# 训练ridge回归模型
pipe_ridge.fit(x_train, y_train)
# 使用ridge回归模型进行预测
y_train_predict = pipe_ridge.predict(x_train)

# 查看ridge回归模型特征参数
print(pipe_ridge.named_steps['ridge_regr'].coef_)

# 查看ridge回归模型截距值
print(pipe_ridge.named_steps['ridge_regr'].intercept_)

在这里插入图片描述

# 提取模型特征参数
coef = pipe_ridge.named_steps['ridge_regr'].coef_

# 提取对应的特征名称
features = x_train.columns.tolist()

# 构建参数df
coef_table = pd.DataFrame({'feature': features, 'coefficient': coef})
print(coef_table)

在这里插入图片描述

coef_table.set_index(['feature']).plot.barh()
# 设置x等于0的参考线
plt.axvline(0, color='k')
# 显示图表
plt.show()

在这里插入图片描述

logstic回归模型

# 加载logstic回归模型
from sklearn.linear_model import LogisticRegression

# 构建线性回归模型
pipe_logistic = Pipeline([
        ('logistic_clf',LogisticRegression(penalty='l1', fit_intercept=True, solver='liblinear'))
        ])
# 训练线性回归模型
pipe_logistic.fit(x_train, y_label)
# 使用线性回归模型进行预测
y_train_predict = pipe_logistic.predict(x_train)

逻辑回归模型参数解释:

penalty（默认使用l2正则系数）

‘l1’: l1正则系数,
‘l2’: l2正则系数
‘none’:无正则系数

solver（默认是’liblinear’:坐标下降法）

‘liblinear’：坐标下降法，可以处理了l1和l2正则系数，适用于小数据量（一般指10w个样本以下）
‘sag’：sag是随机平均梯度下降法，只能处理l2正则系数，适用于大数据量
‘saga’: saga是sag的变体，能处理l1和l2正则系数，适用于大数据量

# 查看逻辑回归模型特征参数
print(pipe_logistic.named_steps['logistic_clf'].coef_)

# 查看逻辑回归模型截距值
print(pipe_logistic.named_steps['logistic_clf'].intercept_)

在这里插入图片描述

# 提取模型特征参数
coef = pipe_logistic.named_steps['logistic_clf'].coef_[0]

# 提取对应的特征名称
features = x_train.columns.tolist()

# 构建参数df
coef_table = pd.DataFrame({'feature': features, 'coefficient': coef})
print(coef_table)

在这里插入图片描述

coef_table.set_index(['feature']).plot.barh()
# 设置x等于0的参考线
plt.axvline(0, color='k')
# 显示图表
plt.show()

在这里插入图片描述

数据结构与算法最新文章

【力扣106】从中序与后续遍历序列构造二叉

leetcode 322 零钱兑换

哈希的应用：海量数据处理

动态规划|最短Hamilton路径

华为机试_HJ41 称砝码【中等】【menset】【

【C与数据结构】——寒假提高每日练习Day1

基础算法——堆排序

2023王道数据结构线性表--单链表课后习题部

LeetCode 之反转链表的一部分

【题解】lintcode必刷50题＜有效的括号序列

加:2021-08-13 12:32:05 更:2021-08-13 12:33:30

360图书馆购物三丰科技阅读网日历万年历 2025年8日历

-2025/8/23 11:15:49-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码