分类
? sklearn.svm.SVC(C=1.0, kernel=‘rbf’, degree=3, gamma=‘auto’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,random_state=None)
? C:C-SVC的惩罚参数C?默认值是1.0 C越大,相当于惩罚松弛变量,希望松弛变量接近0,即对误分类的 惩罚增大
? kernel :核函数,默认是rbf,可以是‘linear(线性核函数)’, ‘poly(多项式核函数)’, ‘rbf(高斯核函数)’, ‘sigmoid(二分类核函数)’, ‘precomputed(其他核函数)’
? RBF函数:高斯核函数
? sigmoid:sigmoid函数
? degree :多项式poly函数的维度,默认是3,选择其他核函数时会被忽略
回归
? 与分类类似
? sklearn.svm.SVR(kernel =‘rbf’,degree = 3,gamma =‘auto_deprecated’,coef0 = 0.0,tol = 0.001,C = 1.0,epsilon = 0.1,shrinking = True,cache_size = 200,verbose = False,max_iter = -1 )
SVM做分类
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVC # SVM分类包
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib as mpl
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
数据预处理的工作,便签化数据
data = pd.read_csv(r'txt', header=None)
x = data.iloc[:, :2]
y = data.iloc[:, -1:]
label = LabelEncoder()
y = label.fit_transform(y)
print(y)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
# 惩罚项系数, 高斯核函数进行处理 gamma=20(方差,越大提升的维度也就越高)
clf = SVC(C=0.8, kernel='rbf', gamma=20, decision_function_shape='ovr')
clf.fit(x_train, y_train.ravel())
SVC(C=0.8, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=20, kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
训练集的准确率
y_train_hat = clf.predict(x_train)
print(accuracy_score(y_train, y_train_hat))
0.857142857143
预测集的准确率
y_test_hat = clf.predict(x_test)
print(accuracy_score(y_test, y_test_hat))
0.8
iris_feature = '花萼长度', '花萼宽度', '花瓣长度', '花瓣宽度'
x1_min, x1_max = x.iloc[:, 0].min(), x.iloc[:, 0].max() # 第0列的范围
x2_min, x2_max = x.iloc[:, 1].min(), x.iloc[:, 1].max() # 第1列的范围
x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j] # 生成网格采样点
grid_test = np.stack((x1.flat, x2.flat), axis=1) # 测试点
Z = clf.decision_function(grid_test) # 样本到决策面的距离
print(Z)
grid_hat = clf.predict(grid_test) # 预测分类值
print(grid_hat)
[[-0.09643098 1.03911255 2.05731843]
[-0.09643134 1.0391132 2.05731814]
[-0.09643172 1.03911387 2.05731785]
...,
[-0.09656833 1.03889045 2.05767788]
[-0.0965521 1.03891148 2.05764062]
[-0.09653751 1.03893039 2.05760712]]
[2 2 2 ..., 2 2 2]
grid_hat = grid_hat.reshape(x1.shape) # 使之与输入的形状相同
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False
cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0', '#A0A0FF'])
cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b'])
x1_min, x1_max = x.iloc[:, 0].min(), x.iloc[:, 0].max() # 第0列的范围
x2_min, x2_max = x.iloc[:, 1].min(), x.iloc[:, 1].max() # 第1列的范围
x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j] # 生成网格采样点
grid_test = np.stack((x1.flat, x2.flat), axis=1) # 测试点
plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)
plt.scatter(x.iloc[:, 0], x.iloc[:, 1], c=y, edgecolors='k', s=50, cmap=cm_dark) # 样本
plt.scatter(x_test.iloc[:, 0], x_test.iloc[:, 1], s=120, facecolors='none', zorder=10) # 圈中测试集样本
plt.xlabel(iris_feature[0], fontsize=13)
plt.ylabel(iris_feature[1], fontsize=13)
plt.xlim(x1_min, x1_max)
plt.ylim(x2_min, x2_max)
plt.title(u'鸢尾花SVM二特征分类', fontsize=15)
plt.grid()
plt.show()
SVM做回归
import numpy as np
from sklearn.svm import SVR # svm做回归的包
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
N = 50(样本数)
np.random.seed(0)(随机排序)
#进行排序处理, 为了后期画图线不会乱
x = np.sort(np.random.uniform(0, 6, N), axis=0)
y = 2*np.sin(x) + 0.1*np.random.randn(N)# 加噪声
x = x.reshape(-1, 1)
print('x =\n', x.T)
print('y =\n', y)
import numpy as np
from sklearn.svm import SVR # svm做回归的包
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
N = 50
np.random.seed(0)
#进行排序处理, 为了后期画图线不会乱
x = np.sort(np.random.uniform(0, 6, N), axis=0)
y = 2*np.sin(x) + 0.1*np.random.randn(N)# 加噪声
x = x.reshape(-1, 1)
print('x =\n', x.T)
print('y =\n', y)
x_test = np.linspace(x.min(), 1.2*x.max(), 100).reshape(-1, 1)
y_rbf = svr_rbf.predict(x_test)
y_linear = svr_linear.predict(x_test)
y_poly = svr_poly.predict(x_test)
plt.rcParams['font.sans-serif'] = [u'SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.figure(figsize=(9, 8), facecolor='w')
plt.plot(x_test, y_rbf, 'r-', linewidth=2, label='高斯核')
plt.plot(x_test, y_linear, 'g-', linewidth=2, label='线性核')
plt.plot(x_test, y_poly, 'b-', linewidth=2, label='多项式核')
plt.plot(x, y, 'mo', markersize=6)
plt.scatter(x[svr_rbf.support_], y[svr_rbf.support_], s=130, c='r', marker='*',
label='高斯核支持向量')
plt.legend(loc='lower left')
plt.title('SVR', fontsize=16)
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()(SVM解决回归问题时把分类线表现成了一条回归线来进行表达)
总结:高斯核函数拟合的核函数模型是最好的
|