一、RandomForestClassifier模型(随机森林)
优点:随机性高,适用性广 缺点:效果可解释性差
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
clf = RandomForestClassifier(n_estimators=100)
params = {
'n_estimators':range(1,200)
}
cv = GridSearchCV(clf,param_grid= params ,verbose = 1,n_jobs=-1, scoring='roc_auc')
cv.fit(train_X, train_y)
rfc= cv.best_estimator_
cv.best_score_
OUT:0.7406656860482378
二、GradientBoostingClassifier 模型
优点:优化后结果较好,且可解释性较强 缺点:适用性可解释性差
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(
loss='deviance',
n_estimators=100,
learning_rate=0.1,
max_depth=3,
subsample=1,
min_samples_split=2,
min_samples_leaf=1,
max_features=None,
max_leaf_nodes=None,
min_impurity_split=1e-7,
verbose=0,
warm_start=False,
random_state=0
)
params = {
'n_estimators':range(1,200)
}
cv = GridSearchCV(clf,param_grid= params ,verbose = 1,n_jobs=-1, scoring='roc_auc')
cv.fit(train_X, train_y)
gbc= cv.best_estimator_
cv.best_score_
OUT:0.7624008038357515
三、AdaBoostClassifier模型
优点:随机性高,适用性广,可解释性强,且效果好 缺点:算法较为复杂
from sklearn.ensemble import AdaBoostClassifier
from sklearn.model_selection import GridSearchCV
clf= AdaBoostClassifier()
params = {
'n_estimators':range(1,200)
}
cv = GridSearchCV(clf,param_grid= params ,verbose = 1,n_jobs=-1, scoring='roc_auc')
cv.fit(train_X, train_y)
abc= cv.best_estimator_
cv.best_score_
OUT:0.7610013009463877
|