IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> Coogle学习 LightGBM 任务三 -> 正文阅读

[人工智能]Coogle学习 LightGBM 任务三

任务3:分类、回归和排序任务

  • 步骤1 :学习LightGBM中sklearn接口的使用,导入分类、回归和排序模型。
  • 步骤2 :学习LightGBM中原生train接口的使用。
  • 步骤3 :二分类任务
    1. 使用make_classification,创建一个二分类数据集。
    2. 使用sklearn接口完成训练和预测。
    3. 使用原生train接口完成训练和预测。
  • 步骤4 :多分类任务
    1. 使用make_classification,创建一个多分类数据集。
    2. 使用sklearn接口完成训练和预测。
    3. 使用原生train接口完成训练和预测。
  • 步骤5 :回归任务
    1. 使用make_regression,创建一个回归数据集。
    2. 使用sklearn接口完成训练和预测。
    3. 使用原生train接口完成训练和预测。

步骤1和2使用的数据集仍然为任务一和任务二里的iris数据集,步骤3,4,5为自己生成的数据集

步骤1

import pandas as pd
import lightgbm as lgb
from sklearn.datasets import make_classification
from sklearn.datasets import make_regression
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.model_selection import train_test_split

# 载入数据集
iris = datasets.load_iris()  

# 将原始数据划分为训练,测试,验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(iris.data, iris.target,test_size=0.2,random_state=1,shuffle=True,stratify=iris.target)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True,stratify=train_y_all)

sklearn的LGBMClassifier模型

# sklearn的LGBMClassifier模型
params_sklearn = {
    'learning_rate':0.1,
    'max_bin':150,
    'num_leaves':32,    
    'max_depth':11,
    
    'reg_alpha':0.1,
    'reg_lambda':0.2,   
     
    'objective':'multiclass',
    'n_estimators':300,
    #'class_weight':weight
}

# 定义模型
clf = lgb.LGBMClassifier(**params_sklearn)
# 模型训练
clf.fit(train_data,train_y,early_stopping_rounds=10,eval_set=[(val_data,val_y)],verbose=10)
# 模型预测
y_pred = clf.predict(test_data)

[10]	valid_0's multi_logloss: 0.422237
[20]	valid_0's multi_logloss: 0.290667
[30]	valid_0's multi_logloss: 0.25248
[40]	valid_0's multi_logloss: 0.253147
[2 0 1 0 0 0 2 2 2 1 0 1 2 1 2 0 2 1 1 2 1 1 0 0 2 1 0 0 1 1]


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "
# 与真实值进行比较
print('pred:',y_pred)
print('test:',test_y)
pred: [2 0 1 0 0 0 2 2 2 1 0 1 2 1 2 0 2 1 1 2 1 1 0 0 2 1 0 0 1 1]
test: [2 0 1 0 0 0 2 2 2 1 0 1 2 1 2 0 2 1 1 2 1 1 0 0 2 2 0 0 1 1]

会发现30个预测值仅出现一个错误,正确率还是很高的

sklearn的LGBMRegressor模型

# 由于这里没有读取相应数据集,就不进行训练了
# sklearn的LGBMRegressor模型
params = {
        'num_leaves':54,
        'objective':'regression',
        'max_depth':18,
        'learning_rate':0.01,
        'boosting':'gbdt',
        'metric':'rmse',
        'lambda_l1':0.1
}
reg = lgb.LGBMRegressor(**params, n_estimators = 20000, nthread = 4, n_hobs = -1)

sklearn的LGBMRanker模型

推荐系统的常用模型,https://blog.csdn.net/wuzhongqiang/article/details/110521519

# sklearn的LGBMRanker模型
# 关于'objective'的使用,可以参考:https://github.com/xuetf/KDD_CUP_2020_Debiasing_Rush/issues/4

boosting_type='gbdt', num_leaves=31, reg_alpha=0.0, reg_lambda=1,
        max_depth=-1, n_estimators=300, objective='binary',
        subsample=0.7, colsample_bytree=0.7, subsample_freq=1,
        learning_rate=0.01, min_child_weight=50, random_state=2018,
        n_jobs=-1
params = {
        'num_leaves':54,
        'objective':'lambdarank',
        'max_depth':18,
        'learning_rate':0.01,
        'boosting':'gbdt',
        'metric':'rmse',
        'lambda_l1':0.1
}
rank = lgb.LGBMRanker(**params, n_estimators = 20000, nthread = 4, n_hobs = -1)

步骤2:使用LightGBM的原生train接口

需要注意的是,若使用LightGBM的原生train接口,需要先使用Dataset对输入数据进行处理,然后输入模型

# lightgbm中使用lgb.train来训练模型,模型参数以形参形式传入:
params_naive={
    "learning_rate":0.1,    
    'max_bin':150,
    'num_leaves':32,
    "max_depth":11,

    "lambda_l1":0.1,
    "lambda_l2":0.2,

    "objective":"multiclass",    
    "num_class":3,
}


# 使用原生接口
dtrain = lgb.Dataset(train_data,label=train_y)
dtest = lgb.Dataset(test_data,label=test_y)
dval = lgb.Dataset(val_data,label=val_y)

clf = lgb.train(params=params_naive,train_set=dtrain,valid_sets=[dtrain,dval],verbose_eval=10,early_stopping_rounds=10,num_boost_round=300)

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000053 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 78
[LightGBM] [Info] Number of data points in the train set: 96, number of used features: 4
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Info] Start training from score -1.098612
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Training until validation scores don't improve for 10 rounds
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[10]	training's multi_logloss: 0.298963	valid_1's multi_logloss: 0.422237
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[20]	training's multi_logloss: 0.122397	valid_1's multi_logloss: 0.290667
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[30]	training's multi_logloss: 0.0638911	valid_1's multi_logloss: 0.25248
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[40]	training's multi_logloss: 0.0380959	valid_1's multi_logloss: 0.253147
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Early stopping, best iteration is:
[32]	training's multi_logloss: 0.057511	valid_1's multi_logloss: 0.24906


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:181: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:239: UserWarning: 'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. "
# 模型预测
y_pred = clf.predict(test_data)
y_pred
array([[0.00854976, 0.01336626, 0.97808399],
       [0.9695985 , 0.02322582, 0.00717568],
       [0.1266051 , 0.84130907, 0.03208582],
       [0.98113984, 0.01159906, 0.0072611 ],
       [0.8987924 , 0.08175213, 0.01945547],
       [0.98113984, 0.01159906, 0.0072611 ],
       [0.01584902, 0.01109586, 0.97305512],
       [0.00854976, 0.01336626, 0.97808399],
       [0.01929656, 0.14943477, 0.83126867],
       [0.01583122, 0.80600597, 0.17816281],
       [0.8987924 , 0.08175213, 0.01945547],
       [0.01596415, 0.96674736, 0.01728849],
       [0.00854976, 0.01336626, 0.97808399],
       [0.0128839 , 0.97000256, 0.01711354],
       [0.00984623, 0.03916364, 0.95099013],
       [0.85542871, 0.12023735, 0.02433394],
       [0.00854976, 0.01336626, 0.97808399],
       [0.01230884, 0.97153018, 0.01616098],
       [0.1266051 , 0.84130907, 0.03208582],
       [0.02180155, 0.18923031, 0.78896814],
       [0.01236308, 0.97581151, 0.0118254 ],
       [0.01437657, 0.97083508, 0.01478835],
       [0.98113984, 0.01159906, 0.0072611 ],
       [0.9695985 , 0.02322582, 0.00717568],
       [0.00854976, 0.01336626, 0.97808399],
       [0.02056858, 0.9545397 , 0.02489172],
       [0.8987924 , 0.08175213, 0.01945547],
       [0.9695985 , 0.02322582, 0.00717568],
       [0.04605153, 0.9231903 , 0.03075817],
       [0.01224923, 0.96682535, 0.02092542]])

步骤3:二分类任务

使用make_classification,创建一个二分类数据集

# 使用make_classification,创建一个二分类数据集
bi_class_data = make_classification(
                        n_samples=10000, n_features=20, n_informative=5, n_redundant=2,
                        n_repeated=0, n_classes=2, n_clusters_per_class=2, 
                        flip_y=0.4, class_sep=1.0, 
                        hypercube=True,shift=0.0, scale=1.0, 
                        shuffle=True, random_state=2022
                )
data = pd.DataFrame(bi_class_data[0])
label = pd.DataFrame(bi_class_data[1])

label.value_counts()
1    5055
0    4945
dtype: int64
from sklearn.model_selection import train_test_split

# 将原始数据划分为训练,测试,验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(data, label,test_size=0.2,random_state=1,shuffle=True,stratify=label)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True,stratify=train_y_all)

使用sklearn接口完成训练和预测

# sklearn的LGBMClassifier模型
params_sklearn = {
    'learning_rate':0.1,
    'max_bin':150,
    'num_leaves':32,    
    'max_depth':11,
    
    'reg_alpha':0.1,
    'reg_lambda':0.2,   
    'n_estimators':300,
}

# 定义模型
clf = lgb.LGBMClassifier(**params_sklearn)
# 模型训练
clf.fit(train_data,train_y,early_stopping_rounds=10,eval_set=[(val_data,val_y)],verbose=10)
# 模型预测
y_pred = clf.predict(test_data)
y_pred
[10]	valid_0's binary_logloss: 0.550373
[20]	valid_0's binary_logloss: 0.523356
[30]	valid_0's binary_logloss: 0.515842
[40]	valid_0's binary_logloss: 0.516377


D:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py:63: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(*args, **kwargs)
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "





array([1, 1, 1, ..., 0, 1, 0])
test_y
0
87621
24471
93801
72091
83791
......
90130
19430
42280
65661
47121

2000 rows × 1 columns

使用原生train接口完成训练和预测

# lightgbm中使用lgb.train来训练模型,模型参数以形参形式传入:
params_naive={
    "learning_rate":0.1,    
    'max_bin':150,
    'num_leaves':32,
    "max_depth":11,

    "lambda_l1":0.1,
    "lambda_l2":0.2,

    "objective":"multiclass",    
    "num_class":2,
}


# 使用原生接口
dtrain = lgb.Dataset(train_data,label=train_y)
dval = lgb.Dataset(val_data,label=val_y)

clf = lgb.train(params=params_naive,train_set=dtrain,valid_sets=[dtrain,dval],verbose_eval=10,early_stopping_rounds=10,num_boost_round=300)

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000498 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3000
[LightGBM] [Info] Number of data points in the train set: 6400, number of used features: 20
[LightGBM] [Info] Start training from score -0.704145
[LightGBM] [Info] Start training from score -0.682269
Training until validation scores don't improve for 10 rounds
[10]	training's multi_logloss: 0.533821	valid_1's multi_logloss: 0.549425
[20]	training's multi_logloss: 0.485421	valid_1's multi_logloss: 0.522554
[30]	training's multi_logloss: 0.454127	valid_1's multi_logloss: 0.516862
[40]	training's multi_logloss: 0.429026	valid_1's multi_logloss: 0.517027
Early stopping, best iteration is:
[32]	training's multi_logloss: 0.448519	valid_1's multi_logloss: 0.516316


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:181: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:239: UserWarning: 'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. "
# 模型预测
y_pred = clf.predict(test_data)
y_pred
array([[0.24276812, 0.75723188],
       [0.17581516, 0.82418484],
       [0.23915198, 0.76084802],
       ...,
       [0.78755907, 0.21244093],
       [0.47517021, 0.52482979],
       [0.76431998, 0.23568002]])

步骤4:多分类任务

使用make_classification,创建一个多分类数据集

# 使用make_classification,创建一个多分类数据集
mul_class_data = make_classification(n_samples=10000, n_features=20, n_informative=4, n_redundant=2,
                        n_repeated=0, n_classes=5, n_clusters_per_class=2, weights=[0.05,0.1,0.1,0.5],
                        flip_y=0.4, class_sep=1.0, hypercube=True,shift=0.0, scale=1.0, 
                        shuffle=True, random_state=2018)
data = pd.DataFrame(mul_class_data[0])
label = pd.DataFrame(mul_class_data[1])

label.value_counts()
3    3789
4    2266
2    1418
1    1413
0    1114
dtype: int64
from sklearn.model_selection import train_test_split

# 将原始数据划分为训练,测试,验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(data, label,test_size=0.2,random_state=1,shuffle=True,stratify=label)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True,stratify=train_y_all)

使用sklearn接口完成训练和预测

# sklearn的LGBMClassifier模型
params_sklearn = {
    'learning_rate':0.1,
    'max_bin':150,
    'num_leaves':32,    
    'max_depth':11,
    
    'reg_alpha':0.1,
    'reg_lambda':0.2,   
     
    'objective':'multiclass',
    'n_estimators':300,
    'n_class':5
    #'class_weight':weight
}

# 定义模型
clf = lgb.LGBMClassifier(**params_sklearn)
# 模型训练
clf.fit(train_data,train_y,early_stopping_rounds=10,eval_set=[(val_data,val_y)],verbose=10)
# 模型预测
y_pred = clf.predict(test_data)

[LightGBM] [Warning] Unknown parameter: n_class
[10]	valid_0's multi_logloss: 1.32287
[20]	valid_0's multi_logloss: 1.3157
[30]	valid_0's multi_logloss: 1.31711


D:\ProgramData\Anaconda3\lib\site-packages\sklearn\utils\validation.py:63: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  return f(*args, **kwargs)
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "
y_pred
array([3, 3, 0, ..., 3, 3, 3])
test_y
0
52611
33303
44230
56023
24540
......
9932
36403
67460
44332
43543

2000 rows × 1 columns

使用原生train接口完成训练和预测

# lightgbm中使用lgb.train来训练模型,模型参数以形参形式传入:
params_naive={
    "learning_rate":0.1,    
    'max_bin':150,
    'num_leaves':32,
    "max_depth":11,

    "lambda_l1":0.1,
    "lambda_l2":0.2,

    "objective":"multiclass",    
    "num_class":5
}


# 使用原生接口
dtrain = lgb.Dataset(train_data,label=train_y)
dval = lgb.Dataset(val_data,label=val_y)

clf = lgb.train(params=params_naive,train_set=dtrain,valid_sets=[dtrain,dval],verbose_eval=10,early_stopping_rounds=10,num_boost_round=300)

[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000684 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3000
[LightGBM] [Info] Number of data points in the train set: 6400, number of used features: 20
[LightGBM] [Info] Start training from score -2.194572
[LightGBM] [Info] Start training from score -1.956118
[LightGBM] [Info] Start training from score -1.953911
[LightGBM] [Info] Start training from score -0.970466
[LightGBM] [Info] Start training from score -1.484734
Training until validation scores don't improve for 10 rounds
[10]	training's multi_logloss: 1.16251	valid_1's multi_logloss: 1.32287
[20]	training's multi_logloss: 1.01795	valid_1's multi_logloss: 1.3157
[30]	training's multi_logloss: 0.907985	valid_1's multi_logloss: 1.31711
Early stopping, best iteration is:
[22]	training's multi_logloss: 0.992725	valid_1's multi_logloss: 1.31435


D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:181: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
D:\ProgramData\Anaconda3\lib\site-packages\lightgbm\engine.py:239: UserWarning: 'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose_eval' argument is deprecated and will be removed in a future release of LightGBM. "
# 模型预测
y_pred = clf.predict(test_data)
y_pred
array([[0.06573803, 0.24028913, 0.25260646, 0.28651964, 0.15484674],
       [0.12112858, 0.11112768, 0.11250295, 0.53089178, 0.12434901],
       [0.38740289, 0.15078851, 0.08953237, 0.19059779, 0.18167844],
       ...,
       [0.08005472, 0.06012911, 0.16478034, 0.56084711, 0.13418871],
       [0.10006992, 0.1113494 , 0.21564907, 0.43883464, 0.13409696],
       [0.06697997, 0.10993549, 0.08865547, 0.61475259, 0.11967648]])

步骤5:回归任务

使用make_regression,创建一个回归数据集

n_samples:int,默认=100
样本数。
n_features:int,默认=100
特征的数量。
n_informative:int,默认=10
信息特征的数量,即用于构建用于生成输出的线性模型的特征数量。
n_targets:int,默认=1
回归目标的数量,即与样本相关的 y 输出向量的维度。默认情况下,输出是一个标量。
bias:float,默认=0.0
基础线性模型中的偏差项。
Effective_rank:int,默认=无
if not None:
通过线性组合解释大部分输入数据所需的奇异向量的近似数量。在输入中使用这种奇异谱允许生成器重现实践中经常观察到的相关性。
if None:
输入集条件良好、居中且具有单位方差的高斯分布。
tail_strength,float,默认=0.5
如果effective_rank不是“无” ,则奇异值轮廓的胖噪声尾部的相对重要性。当一个浮点数时,它应该在 0 和 1 之间。
noise,float,默认=0.0
应用于输出的高斯噪声的标准偏差。
shuffle,bool,默认=True
随机播放样本和特征。
coef,bool,默认=False
如果为 True,则返回基础线性模型的系数。
random_state:int,RandomState 实例或无,默认=无
确定数据集创建的随机数生成。跨多个函数调用传递一个 int 以实现可重现的输出。请参阅词汇表。

# 使用make_classification,创建一个多分类数据集
reg_data = make_regression(
                        n_samples=10000, n_features=20, n_informative=4,
                        shuffle=True, random_state=2022)
data = pd.DataFrame(reg_data[0])
label = pd.DataFrame(reg_data[1])

# label.value_counts()

使用sklearn接口完成训练和预测

from sklearn.model_selection import train_test_split

# 将原始数据划分为训练,测试,验证集
train_data_all,test_data,train_y_all,test_y = \
                train_test_split(data, label,test_size=0.2,random_state=1,shuffle=True)
train_data,val_data,train_y,val_y = \
                train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True)
params = {
            'num_leaves':54,
            'objective':'regression',
            'max_depth':18,
            'learning_rate':0.01,
            'boosting':'gbdt',
            'metric':'rmse',
            'lambda_l1':0.1
        }

model = lgb.LGBMRegressor(**params, n_estimators = 20000, nthread = 4, n_hobs = -1)
model.fit(train_data,train_y,
         eval_set=[(val_data, val_y)],
         eval_metric='rmse',
         verbose=400,early_stopping_rounds=200)
[400]	valid_0's rmse: 10.0853
[800]	valid_0's rmse: 7.47665
[1200]	valid_0's rmse: 7.39721
[1600]	valid_0's rmse: 7.37381
[2000]	valid_0's rmse: 7.3529
[2400]	valid_0's rmse: 7.34084





LGBMRegressor(boosting='gbdt', lambda_l1=0.1, learning_rate=0.01, max_depth=18,
              metric='rmse', n_estimators=20000, n_hobs=-1, nthread=4,
              num_leaves=54, objective='regression')
# 模型预测
y_pred = model.predict(test_data)
y_pred
array([-44.12178542, -66.2419684 ,  64.01033108, ..., 165.45905462,
       -92.42507017,  -3.32388838])
test_y
0
9953-54.013487
3850-65.165979
496273.254042
3886-10.609690
5437-182.511371
......
3919-121.955407
162-5.252414
7903174.163314
2242-84.114293
2745-6.554235

2000 rows × 1 columns

使用原生train接口完成训练和预测

# 构建数据集
lgb_train = lgb.Dataset(train_data,label=train_y)
lgb_val = lgb.Dataset(val_data,label=val_y)
# lgbt直接train的代码
params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbose': 0
}

# generate feature names
# feature_name = ['feature_' + str(col) for col in range(num_feature)]

reg = lgb.train(params,
          lgb_train,
          num_boost_round=10,
          valid_sets=lgb_val,  # eval training data
          # feature_name=feature_name,
          categorical_feature=[21])
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000764 seconds.
You can set `force_col_wise=true` to remove the overhead.
[1]	valid_0's binary_logloss: 0.668837
[2]	valid_0's binary_logloss: 0.653079
[3]	valid_0's binary_logloss: 0.631535
[4]	valid_0's binary_logloss: 0.600479
[5]	valid_0's binary_logloss: 0.572133
[6]	valid_0's binary_logloss: 0.545759
[7]	valid_0's binary_logloss: 0.535201
[8]	valid_0's binary_logloss: 0.519968
[9]	valid_0's binary_logloss: 0.510698
[10]	valid_0's binary_logloss: 0.489484
y_pre = reg.predict(test_data)
y_pre
array([0.3692487 , 0.41450911, 0.6040802 , ..., 0.67991224, 0.40550047,
       0.46091694])
  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2022-01-14 01:58:30  更:2022-01-14 02:01:14 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/26 22:47:27-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码