开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> Python学习笔记之线性回归 -> 正文阅读

[Python知识库]Python学习笔记之线性回归

r数据预处理

Missing Data

1. .isnull() 查找缺失的数据，输出为布尔型，如果是真，就输出它的数量

from pandas import read_csv
import numpy as np
dataset = read_csv('pima-indians-diabetes.csv', header=None)
#print(dataset.describe())
print(dataset.head(20))
print(dataset.isnull().sum())

如果sum的结果特别大，需要进行处理，否则会影响数据处理的效果

2. 处理option1 - 用0替换

dataset[[0,2,3,4,5,6]] = dataset[[0,2,3,4,5,6]].replace(np.NaN,0)

3. 处理option2 - 移除改行数据

如果确实的数据行不是很多的话可以这样操作

dataset.dropna(inplace=True)

ps:??inplace=True 表示，表中的原始数据会被替换，false则不会（参考resize和reshape）。如果不写，默认为false

4. 处理option3 - 用平均值mean value替换

dataset.fillna(dataset.mean(), inplace=True)

Normalization 归一化

不同评价指标往往具有不同的量纲和量纲单位，这样的情况会影响到数据分析的结果，为了消除指标之间的量纲影响，需要进行数据标准化处理，以解决数据指标之间的可比性。

归一化方法有两种形式：

1. 导入函数

from sklearn.preprocessing import StandardScaler, MinMaxScaler

2. MiniMax将原始数据线性化的方法转换到[0 1]的范围

xy_m = MinMaxScaler().fit_transform(xy)

3. Standard将原始数据集归一化为均值为0、方差1的数据集

xy_s = StandardScaler().fit_transform(xy)

通过这个方法，让数据呈现在同一个量级

单一线性回归

1. 首先导入所有可能用到的函数

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd 
import math
from sklearn import datasets, linear_model

2. 导入数据

3. 确认丢失的数据

Boston.isnull().sum()

4. 设定x,y，确定回归方程

x = Boston[['lstat']].values
y = Boston['medv'].values

regr = linear_model.LinearRegression()
regr.fit(x,y)
#regr.predict(x_0)

print('Coefficients:', regr.coef_)
print('Intercept:', regr.intercept_)

因为x是dataframe，所以用了两个方括号，y永远只有一个，所以不用

fit function用来计算w0和w1

intercept = w0，coefficients = w1

5. 检查回归方程的准确性

from sklearn import metrics

y_pred = regr.predict(x)
#print(metrics.explained_variance_score(y, y_pred))
print(metrics.mean_absolute_error(y, y_pred))
print(metrics.mean_squared_error(y, y_pred))

数字越小说明准确性越高

多元线性回归

1.导入函数和数据

import numpy as np
from sklearn import linear_model
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

diabetes = datasets.load_diabetes()
type(diabetes)
diabetes

2.?设定x,y，确定回归方程，并用plot输出图形

x = diabetes.data[:,[2,8]]
y = diabetes.target
regr = linear_model.LinearRegression()
regr.fit(x, y)
print('Coefficients:', regr.coef_)
print('Intercept:', regr.intercept_)

steps = 40
lx0 = np.arange(min(x[:,0]),max(x[:,0]),(max(x[:,0]) - min(x[:,0]))/steps).reshape(steps,1)
lx1 = np.arange(min(x[:,1]b),max(x[:,1]),(max(x[:,1]) - min(x[:,1]))/steps).reshape(steps,1)
xx0,xx1 = np.meshgrid(lx0, lx1)
xx = np.zeros(shape = (steps,steps,2))
xx[:,:,0] = xx0
xx[:,:,1] = xx1

x_stack = xx.reshape(steps**2,2)
y_stack = regr.predict(x_stack)
yy = y_stack.reshape(steps,steps)

fig = plt.figure()
ax = fig.gca(projection = '3d')
ax.scatter(x[:,0],x[:,1],y,color = 'red')
ax.plot_surface(xx0,xx1,yy, rstride = 1, cstride = 1)
plt.show()

X选用了所有行的第2，8列

因为有两个变量，所以coef有w1和w2

3. 事实上，当我们有很多变量的时候，可以选取其中一部分来建立模型，然后用剩余的进行检测

所以要先spilt data，用train data建立模型

# Splitting data into training/testing
from sklearn.model_selection import train_test_split

# Split into training/testing sets
# 75% is training and 25% is testing data
X_train, X_test, y_train, y_test = train_test_split(x, y, 
                                                    test_size = 0.25,
                                                    random_state=123)

regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
print('Coefficients:', regr.coef_)
print('Intercept:', regr.intercept_)

然后用test data进行测试?

y_pred = regr.predict(X_test)

print('Test MAE:',metrics.mean_absolute_error(y_test, y_pred))
print('Test MSE:',metrics.mean_squared_error(y_test, y_pred))

Python知识库最新文章

Python中String模块

【Python】 14-CVS文件操作

python的panda库读写文件

使用Nordic的nrf52840实现蓝牙DFU过程

【Python学习记录】numpy数组用法整理

Python学习笔记

python字符串和列表

python如何从txt文件中解析出有效的数据

Python编程从入门到实践自学/3.1-3.2

python变量

加:2021-08-30 12:01:04 更:2021-08-30 12:02:44

360图书馆购物三丰科技阅读网日历万年历 2026年3日历

-2026/3/31 6:50:22-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码