第一步:导包
import pandas as pd
import numpy as np
第二步:导入数据
dataset = pd.read_csv('D:/daily/机器学习100天/100-Days-Of-ML-Code-中文版本/100-Days-Of-ML-Code-master/datasets/50_Startups.csv')
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : , 4 ].values
第三步:编码
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X_3 = labelencoder.fit_transform(X[:,3])
X[:,3] = X_3
print(X_3)
State = X[:,3]
State = State.reshape(-1,1)
env = OneHotEncoder(categories = 'auto').fit(State)
res = env.transform(State).toarray()
X = np.hstack((X[:, :3], res))
打印: X_3
[2 0 1 2 1 2 0 1 2 0 1 0 1 0 1 2 0 2 1 2 0 2 1 1 2 0 1 2 1 2 1 2 0 1 0 2 1
0 2 0 0 1 0 2 0 2 1 0 2 0]
res
[[0. 0. 1.]
[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]
[0. 1. 0.]
[0. 0. 1.]
......
即2的独热编码为[0,0,1],0的独热编码为[1,0,0],1的独热编码为[0,1,0]。
第四步:避免虚拟变量陷阱
X = X[ : , :3]
第五步:划分训练集、测试集
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
第六步:多重线性回归拟合
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, Y_train)
第七步:预测
y_pred = regressor.predict(X_test)
第八步:回归性能指标
from sklearn.metrics import r2_score
print(Y_test)
print(y_pred)
print(r2_score(Y_test, y_pred))
打印:0.9393955917820571
R2 决定系数(拟合优度),模型越好:r2→1;模型越差:r2→0
完整代码:
import pandas as pd
import numpy as np
dataset = pd.read_csv('D:/daily/机器学习100天/100-Days-Of-ML-Code-中文版本/100-Days-Of-ML-Code-master/datasets/50_Startups.csv')
X = dataset.iloc[ : , :-1].values
Y = dataset.iloc[ : , 4 ].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X_3 = labelencoder.fit_transform(X[:,3])
X[:,3] = X_3
print(X_3)
State = X[:,3]
State = State.reshape(-1,1)
env = OneHotEncoder(categories = 'auto').fit(State)
res = env.transform(State).toarray()
X = np.hstack((X[:, :3], res))
X = X[ : , :3]
print(X)
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, Y_train)
Y_pred = regressor.predict(X_test)
from sklearn.metrics import r2_score
print(r2_score(Y_test, Y_pred))
|