[数据结构与算法] 13、支持向量机实战

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 数据结构与算法 -> 13、支持向量机实战 -> 正文阅读

[数据结构与算法]13、支持向量机实战

分类

? sklearn.svm.SVC(C=1.0, kernel=‘rbf’, degree=3, gamma=‘auto’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,random_state=None)

? C：C-SVC的惩罚参数C?默认值是1.0 C越大，相当于惩罚松弛变量，希望松弛变量接近0，即对误分类的惩罚增大

? kernel ：核函数，默认是rbf，可以是‘linear（线性核函数）’, ‘poly（多项式核函数）’, ‘rbf（高斯核函数）’, ‘sigmoid(二分类核函数)’, ‘precomputed（其他核函数）’

? RBF函数：高斯核函数

? sigmoid：sigmoid函数

? degree ：多项式poly函数的维度，默认是3，选择其他核函数时会被忽略

回归

? 与分类类似

? sklearn.svm.SVR（kernel =‘rbf’，degree = 3，gamma =‘auto_deprecated’，coef0 = 0.0，tol = 0.001，C = 1.0，epsilon = 0.1，shrinking = True，cache_size = 200，verbose = False，max_iter = -1 ）

SVM做分类

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVC # SVM分类包
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib as mpl
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

数据预处理的工作，便签化数据

data = pd.read_csv(r'txt', header=None)
x = data.iloc[:, :2]
y = data.iloc[:, -1:]
label = LabelEncoder()
y = label.fit_transform(y)
print(y)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
# 惩罚项系数， 高斯核函数进行处理   gamma=20（方差，越大提升的维度也就越高）
clf = SVC(C=0.8, kernel='rbf', gamma=20, decision_function_shape='ovr')
clf.fit(x_train, y_train.ravel())

SVC(C=0.8, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=20, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

训练集的准确率

y_train_hat = clf.predict(x_train)
print(accuracy_score(y_train, y_train_hat))
0.857142857143

预测集的准确率

y_test_hat = clf.predict(x_test)
print(accuracy_score(y_test, y_test_hat))
0.8

iris_feature = '花萼长度', '花萼宽度', '花瓣长度', '花瓣宽度'

x1_min, x1_max = x.iloc[:, 0].min(), x.iloc[:, 0].max()  # 第0列的范围
x2_min, x2_max = x.iloc[:, 1].min(), x.iloc[:, 1].max()  # 第1列的范围
x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]  # 生成网格采样点
grid_test = np.stack((x1.flat, x2.flat), axis=1)  # 测试点

Z = clf.decision_function(grid_test)  # 样本到决策面的距离
print(Z)
grid_hat = clf.predict(grid_test)  # 预测分类值
print(grid_hat)

[[-0.09643098  1.03911255  2.05731843]
 [-0.09643134  1.0391132   2.05731814]
 [-0.09643172  1.03911387  2.05731785]
 ..., 
 [-0.09656833  1.03889045  2.05767788]
 [-0.0965521   1.03891148  2.05764062]
 [-0.09653751  1.03893039  2.05760712]]
[2 2 2 ..., 2 2 2]

grid_hat = grid_hat.reshape(x1.shape)  # 使之与输入的形状相同
mpl.rcParams['font.sans-serif'] = [u'SimHei']
mpl.rcParams['axes.unicode_minus'] = False
cm_light = mpl.colors.ListedColormap(['#A0FFA0', '#FFA0A0', '#A0A0FF'])
cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b'])
x1_min, x1_max = x.iloc[:, 0].min(), x.iloc[:, 0].max()  # 第0列的范围
x2_min, x2_max = x.iloc[:, 1].min(), x.iloc[:, 1].max()  # 第1列的范围
x1, x2 = np.mgrid[x1_min:x1_max:500j, x2_min:x2_max:500j]  # 生成网格采样点
grid_test = np.stack((x1.flat, x2.flat), axis=1)  # 测试点
plt.pcolormesh(x1, x2, grid_hat, cmap=cm_light)

plt.scatter(x.iloc[:, 0], x.iloc[:, 1], c=y, edgecolors='k', s=50, cmap=cm_dark)  # 样本
plt.scatter(x_test.iloc[:, 0], x_test.iloc[:, 1], s=120, facecolors='none', zorder=10)  # 圈中测试集样本
plt.xlabel(iris_feature[0], fontsize=13)
plt.ylabel(iris_feature[1], fontsize=13)
plt.xlim(x1_min, x1_max)
plt.ylim(x2_min, x2_max)
plt.title(u'鸢尾花SVM二特征分类', fontsize=15)
plt.grid()
plt.show()

SVM做回归

import numpy as np
from sklearn.svm import SVR # svm做回归的包
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

N = 50（样本数）
np.random.seed(0)（随机排序）
#进行排序处理， 为了后期画图线不会乱
x = np.sort(np.random.uniform(0, 6, N), axis=0)
y = 2*np.sin(x) + 0.1*np.random.randn(N)# 加噪声
x = x.reshape(-1, 1)
print('x =\n', x.T)
print('y =\n', y)

import numpy as np
from sklearn.svm import SVR # svm做回归的包
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

N = 50
np.random.seed(0)
#进行排序处理， 为了后期画图线不会乱
x = np.sort(np.random.uniform(0, 6, N), axis=0)
y = 2*np.sin(x) + 0.1*np.random.randn(N)# 加噪声
x = x.reshape(-1, 1)
print('x =\n', x.T)
print('y =\n', y)

x_test = np.linspace(x.min(), 1.2*x.max(), 100).reshape(-1, 1)
y_rbf = svr_rbf.predict(x_test)
y_linear = svr_linear.predict(x_test)
y_poly = svr_poly.predict(x_test)

plt.rcParams['font.sans-serif'] = [u'SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.figure(figsize=(9, 8), facecolor='w')
plt.plot(x_test, y_rbf, 'r-', linewidth=2, label='高斯核')
plt.plot(x_test, y_linear, 'g-', linewidth=2, label='线性核')
plt.plot(x_test, y_poly, 'b-', linewidth=2, label='多项式核')
plt.plot(x, y, 'mo', markersize=6)
plt.scatter(x[svr_rbf.support_], y[svr_rbf.support_], s=130, c='r', marker='*', 
            label='高斯核支持向量')
plt.legend(loc='lower left')
plt.title('SVR', fontsize=16)
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True)
plt.show()（SVM解决回归问题时把分类线表现成了一条回归线来进行表达）

总结：高斯核函数拟合的核函数模型是最好的

数据结构与算法最新文章

【力扣106】从中序与后续遍历序列构造二叉

leetcode 322 零钱兑换

哈希的应用：海量数据处理

动态规划|最短Hamilton路径

华为机试_HJ41 称砝码【中等】【menset】【

【C与数据结构】——寒假提高每日练习Day1

基础算法——堆排序

2023王道数据结构线性表--单链表课后习题部

LeetCode 之反转链表的一部分

【题解】lintcode必刷50题＜有效的括号序列