[C++知识库] LightGBM C++使用问题

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> C++知识库 -> LightGBM C++使用问题 -> 正文阅读

[C++知识库]LightGBM C++使用问题

python下已测试通过，无问题：

然而C++下：

#include <iostream>
#include <LightGBM/application.h>
#include <LightGBM/c_api.h>
#include <iostream>
#include <vector>

#include <opencv2/opencv.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/core/core.hpp>

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

//#include <tbb/tbb.h>
//using namespace tbb;
//https://cloud.tencent.com/developer/ask/sof/781779
/*
I had a similar issue and in my case I found that the problem was the is_linear property in the model.
I compared the model that I generated from the binary_classification example with the model I was using
and I noticed that the model in the example has the is_linear=0 property for each tree.
On my model it was missing.
 Then I checked the c++ code and found that if this property is missing,
 the variable describing this is true. I set it to false as default and that works for me.
 I can't give more details as I just recently began working with LGBM models and c++.
 * */

using namespace cv;
using namespace std;

int main()
{
	char srcimg[400]={0};

	int numiterations = 1;
	BoosterHandle handle;
	int set = LGBM_BoosterCreateFromModelfile("/home/jumper/xrt/reference/model/lgbmmodel/lbgm_zhu_1.model", &numiterations, &handle);
	if(set==0)
	{
		std::cout << "load model successfully !  "<< std::endl;
	}


	int channels[]={1,2};
	int histsize[]={8,8};
	float ghistrange[]={0,255};
	float rhistrange[]={0,255};
	const float *histsranges[]={ghistrange,rhistrange};
	for(int index=0;index<=100;index++)
	{
		sprintf(srcimg,"/home/jumper/xrt/reference/imgs/zhudoctor/orescnn/%d.png",index);
		Mat img=imread(srcimg,IMREAD_UNCHANGED);
		if(img.empty())
			continue;

		MatND hist;
		cv::calcHist(&img,1,channels,Mat(),hist,2,histsize,histsranges);

		std::vector<double> out(1, 0);
		double *out_result = static_cast<double *>(out.data());
		int64_t out_len;
		int res = LGBM_BoosterPredictForMat(handle,hist.data,C_API_DTYPE_FLOAT32,1,64,1,C_API_PREDICT_NORMAL,-1,"None",&out_len,out_result);

		std::cout <<"image id:"<<index<<" ---predict flag:"<<res<< " ---LGBM row predict result is: " << out[0] << std::endl;
	}

    return 0;
}

问题是：

?1，首先是与python下概率不一致；

2，然后是所有输入的结果都一样，无任何改变：

load model successfully !  
[LightGBM] [Warning] Unknown parameter None
image id:0 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:1 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:2 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:3 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:4 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:5 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:6 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:7 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:8 ---predict flag:0 ---LGBM row predict result is: 0.400716
[LightGBM] [Warning] Unknown parameter None
image id:9 ---predict flag:0 ---LGBM row predict result is: 0.400716

关于第2点问题，只查到：LightGBM在任何输入上产生相同的概率(C++) - 问答 - 云+社区 - 腾讯云

?最近查不了外网，好烦。等能查外网了估计就可以解决了。

初步怀疑版本问题，准备从2.1.1升级到3.3.1试一下。2.x和3.x有很大不一样，编译3.3.1时遇到下列问题：

LightGBM-3.3.1/include/LightGBM/utils/common.h:36:59: fatal error: ../../../external_libs/fmt/include/fmt/format.h: 没有那个文件或目录
LightGBM-3.3.1/include/LightGBM/utils/common.h:38:82: fatal error: ../../../external_libs/fast_double_parser/include/fast_double_parser.h: 没有那个文件或目录

2,LightGBM-3.3.1/src/treelearner/linear_tree_learner.cpp:7:23: fatal error: Eigen/Dense: 没有那个文件或目录
https://gitlab.com/libeigen/eigen/-/releases/3.4.0    LightGBM-3.3.1/external_libs/

解决办法：直接 LightGBM/external_libs at master · microsoft/LightGBM · GitHub在这里下载对应的库解压后放在对应文件夹即可。编译一路顺风生成了动态库。将include、external_libs和lib_lightgbm.so文件夹打包即可应用。

测试官例也通过了：

?然而应用时发现：

lightGBM/include/LightGBM/utils/common.h:57:26: 错误：‘void* malloc(size_t)’先被声明为‘extern’后又被声明为‘static’ [-fpermissive]
/usr/local/lib/gcc/x86_64-pc-linux-gnu/9.1.0/include/mm_malloc.h:41:7: 错误：‘__alignment’在此作用域中尚未声明
   41 |   if (__alignment == 1)
      |       ^~~~~~~~~~~
/usr/local/lib/gcc/x86_64-pc-linux-gnu/9.1.0/include/mm_malloc.h:43:7: 错误：‘__alignment’在此作用域中尚未声明
   43 |   if (__alignment == 2 || (sizeof (void *) == 8 && __alignment == 4))
      |       ^~~~~~~~~~~
/usr/local/lib/gcc/x86_64-pc-linux-gnu/9.1.0/include/mm_malloc.h:45:31: 错误：‘__alignment’在此作用域中尚未声明
LightGBM/utils/common.h:58:21: 错误：‘void free(void*)’先被声明为‘extern’后又被声明为‘static’ [-fpermissive]

这些问题。然后查了下：https://github.com/microsoft/LightGBM/pull/5111

Compilation error for cpp tests on macOS with gcc and `thread` sanitizer · Issue #4331 · microsoft/LightGBM · GitHub

可以看到别人也遇到这个问题?

LightGBM/utils/common.h:57:26: error: 'void* malloc(size_t)' was declared 'extern' and later 'static' [-fpermissive]

?只是不知道他是怎么解决的？

然而我换了一台机器，同样的操作，不会报错！！但是仍然有之前说的那2点问题：

1，与python下score不一致；

2，不同的输入，但输出score都一样；

若有大神，麻烦告知。

我觉得应该是predict这个函数没用对。

/*!
 * \brief Make prediction for a new dataset.
 * \note
 * You should pre-allocate memory for ``out_result``:
 *   - for normal and raw score, its length is equal to ``num_class * num_data``;
 *   - for leaf index, its length is equal to ``num_class * num_data * num_iteration``;
 *   - for feature contributions, its length is equal to ``num_class * num_data * (num_feature + 1)``.
 * \param handle Handle of booster
 * \param data Pointer to the data space
 * \param data_type Type of ``data`` pointer, can be ``C_API_DTYPE_FLOAT32`` or ``C_API_DTYPE_FLOAT64``
 * \param nrow Number of rows
 * \param ncol Number of columns
 * \param is_row_major 1 for row-major, 0 for column-major
 * \param predict_type What should be predicted
 *   - ``C_API_PREDICT_NORMAL``: normal prediction, with transform (if needed);
 *   - ``C_API_PREDICT_RAW_SCORE``: raw score;
 *   - ``C_API_PREDICT_LEAF_INDEX``: leaf index;
 *   - ``C_API_PREDICT_CONTRIB``: feature contributions (SHAP values)
 * \param start_iteration Start index of the iteration to predict
 * \param num_iteration Number of iteration for prediction, <= 0 means no limit
 * \param parameter Other parameters for prediction, e.g. early stopping for prediction
 * \param[out] out_len Length of output result
 * \param[out] out_result Pointer to array with predictions
 * \return 0 when succeed, -1 when failure happens
 */
LIGHTGBM_C_EXPORT int LGBM_BoosterPredictForMat(BoosterHandle handle,
                                                const void* data,
                                                int data_type,
                                                int32_t nrow,
                                                int32_t ncol,
                                                int is_row_major,
                                                int predict_type,
                                                int start_iteration,
                                                int num_iteration,
                                                const char* parameter,
                                                int64_t* out_len,
                                                double* out_result);

char srcimg[400]={0};

	int numiterations = 1;
	BoosterHandle handle;
	int set = LGBM_BoosterCreateFromModelfile("/home/jumper/xrt/reference/model/lgbmmodel/lbgm_zhu_1.model", &numiterations, &handle);
	if(set==0)
	{
		std::cout << "load model successfully !  "<< std::endl;
	}


	int channels[]={1,2};
	int histsize[]={8,8};
	float ghistrange[]={0,255};
	float rhistrange[]={0,255};
	const float *histsranges[]={ghistrange,rhistrange};
	for(int index=0;index<=100;index++)
	{
		sprintf(srcimg,"/home/jumper/xrt/reference/imgs/zhudoctor/orescnn/%d.png",index);
		Mat img=imread(srcimg,IMREAD_UNCHANGED);
		if(img.empty())
			continue;

		MatND hist;
		cv::calcHist(&img,1,channels,Mat(),hist,2,histsize,histsranges);

		std::vector<double> out(2, 0);
		double *out_result = static_cast<double *>(out.data());
		int64_t out_len;
		int res = LGBM_BoosterPredictForMat(handle,hist.data,C_API_DTYPE_FLOAT32,1,64,1,C_API_PREDICT_NORMAL,0,-1,"None",&out_len,out_result);

		std::cout <<"image id:"<<index<<" ---predict flag:"<<res<< " ---LGBM row predict result is: " << out[0] << std::endl;
	}

不好意思，是我自己瞎了狗眼输入图像数据类型没对上，改过来后就好了：

#include <iostream>
#include <LightGBM/application.h>
#include <LightGBM/c_api.h>
#include <iostream>
#include <vector>

#include <opencv2/opencv.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/core/core.hpp>

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

using namespace cv;
using namespace std;

int main()
{
	char srcimg[400]={0};

	int numiterations = 1;
	BoosterHandle handle;
	int set = LGBM_BoosterCreateFromModelfile("/home/jumper/imgs/lgbmmodel/lbgm_zhu_1.model", &numiterations, &handle);
	if(set==0)
	{
		std::cout << "load model successfully !  "<< std::endl;
	}

	int channels[]={1,2};
	int histsize[]={8,8};
	float ghistrange[]={0,256};
	float rhistrange[]={0,256};
	const float *histsranges[]={ghistrange,rhistrange};

	for(int index=0;index<=10;index++)
	{
		sprintf(srcimg,"/home/jumper/imgs/cnntmp/doctorzhu_fluorite/orescnn/%d.png",index);
		Mat img=imread(srcimg);
		if(img.empty())
			continue;

		MatND hist;
		cv::calcHist(&img,1,channels,Mat(),hist,2,histsize,histsranges);

		std::vector<double> out(1, 0);
		double *out_result = static_cast<double *>(out.data());
		int64_t out_len;
		int res = LGBM_BoosterPredictForMat(handle,hist.data,C_API_DTYPE_FLOAT32,1,64,1,C_API_PREDICT_NORMAL,0,-1,"None",&out_len,out_result);

		std::cout <<"image id:"<<index<<" ---LGBM row predict result is: " << out[0] <<std::endl;
	}

    return 0;
}

结果已正确：

load model successfully !  
[LightGBM] [Warning] Unknown parameter: None
image id:0 ---LGBM row predict result is: 0.964677
[LightGBM] [Warning] Unknown parameter: None
image id:1 ---LGBM row predict result is: 0.877513
[LightGBM] [Warning] Unknown parameter: None
image id:2 ---LGBM row predict result is: 0.973227
[LightGBM] [Warning] Unknown parameter: None
image id:3 ---LGBM row predict result is: 0.895759
[LightGBM] [Warning] Unknown parameter: None
image id:4 ---LGBM row predict result is: 0.945096
[LightGBM] [Warning] Unknown parameter: None
image id:5 ---LGBM row predict result is: 0.792787
[LightGBM] [Warning] Unknown parameter: None
image id:6 ---LGBM row predict result is: 0.902854
[LightGBM] [Warning] Unknown parameter: None
image id:7 ---LGBM row predict result is: 0.965496
[LightGBM] [Warning] Unknown parameter: None
image id:8 ---LGBM row predict result is: 0.92893
[LightGBM] [Warning] Unknown parameter: None
image id:9 ---LGBM row predict result is: 0.110013
[LightGBM] [Warning] Unknown parameter: None
image id:10 ---LGBM row predict result is: 0.903951

所以如果大家C和python结果没对上，那就检查预处理是否对上了。

~~~~~~~~~~~~~~~~~~~~~~~~~分界线~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

另外又查到多线程问题：Lightgbm多线程卡死问题定位 | 逸思杂陈