1 线性 SVM 分类器
import numpy as np
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
iris = datasets.load_iris()
X = iris["data"][:, (2, 3)]
y = (iris["target"] == 2).astype(np.float64)
svm_clf = Pipeline([
("scaler", StandardScaler()),
("linear_svc", LinearSVC(C=1, loss="hinge", random_state=42)),
])
svm_clf.fit(X, y)
svm_clf.predict([[5.5, 1.7]])
>>>array([1.])
线性 SVM 优点:速度快
2 非线性 SVM 分类器
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)
from sklearn.svm import SVC
poly_kernel_svm_clf = Pipeline([
("scaler", StandardScaler()),
("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
])
poly_kernel_svm_clf.fit(X, y)
rbf_kernel_svm_clf = Pipeline([
("scaler", StandardScaler()),
("svm_clf", SVC(kernel="rbf", gamma=5, C=0.001))
])
rbf_kernel_svm_clf.fit(X, y)
关于 kernel 的选择: 1 首选 linear kernel,在 scikit-learn 中 LinearSVC 比 SVC(kernel=“linear”) 要快,特别是训练集特别大时候,首选 LinearSVC
2 线性kernel 效果不好时再尝试 Gaussian RBF kernel,大部分时候 RBF 都能由较好的效果
3 有时间可以用 cross-validation 和 grid search 尝试其它 kernel
3 SVM 回归
np.random.seed(42)
m = 50
X = 2 * np.random.rand(m, 1)
y = (4 + 3 * X + np.random.randn(m, 1)).ravel()
from sklearn.svm import LinearSVR
svm_reg = LinearSVR(epsilon=1.5, random_state=42)
svm_reg.fit(X, y)
from sklearn.svm import SVR
svm_poly_reg = SVR(kernel="poly", degree=2, C=100, epsilon=0.1, gamma="scale")
svm_poly_reg.fit(X, y)
参考: Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
|