[人工智能] 【文本分类】多意图分类评估指标

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> 【文本分类】多意图分类评估指标 -> 正文阅读

[人工智能]【文本分类】多意图分类评估指标

【文本分类】多意图分类评估指标

主要分为两类：label based measures和example based measures。

label based measures

就是针对每一个分类，都进行一次计算，最后再用一种average方法把多个分类统一起来。

假设有这么一组数据，

expected   predicted
A, C        A, B
C           C
A, B, C     B, C

用sklearn MultiLabelBinarizer 进行转化

expected    predicted
1 0 1       1 1 0
0 0 1       0 0 1
1 1 1       0 1 1

对classA来说，

TP = 1（真实和预测都是1）

FP = 0（真实0，预测1）

TN = 1（真实0，预测0）

FN = 1（真实1，预测0）

TN   FP           1   0
FN   TP           1   1

precision = TP / (TP + FP) = 1 / (1+0) = 1
  
recall = TP / (TP + FN) = 1 / (1+1) = 0.5
  
f1-score = 2*p*r / (p+r) = 0.667

class B

TN   FP           1   1
FN   TP           0   1

Precision = 0.5

Recall = 1.0

F1-score = 0.667

class C

TN   FP           0   0
FN   TP           1   2

Precision = 1.0

Recall = 0.667

F1-score = 0.8

macro average

Precision (macro avg)
= (Precision of A + Precision of B + Precision of C) / 3
= 0.833

micro average (preferred)

Precision (micro avg)
= sum(TP) / (sum(TP) + sum(FP))
= 1+1+2 / ((1+1+2) + (0+1+0))
= 0.8

weighted average

Precision(weighted avg)
= [(Precision of A * support A) + 
(Precision of B * support B) + 
(Precision of C * support C)] 
/ (support A + support B + support C)
= (1*2 + 0.5*1 + 1*3) / 6
= 0.9166

sample average
第一行，真实 AC，预测 AB，precision 1/2 → 两个预测值中有一个是正确的

第二行，真实 C，预测 C，precision 1

第三行，真实 ABC，预测 BC，precision 1 → 预测的都是对的
```
(1/2 + 1 + 1) / 3 = 5/6 = 0.833
```
classification_report
直接用classification_report

example based measures

计算每对真实与预测标签的average difference

hamming loss

预测错了的label占总label的比例
subset accuracy

也叫exact match ratio

最严格的评估方法，真实和预测label必须完全一致，否则为0。这种方法忽略了部分正确的情况，在scikit-learn中的accuracy_score就是subset accuracy。
example-based accuracy

预测正确的label占总label（预测为1和真实为1）的比例
example-based precision
预测正确的label占总预测label的比例