[人工智能] 百度ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> 百度ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling -> 正文阅读

[人工智能]百度ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling

简介

主要出发点

主要工作

3.2. Explicitly N-gram Masked Language

3.3 Comprehensive N-gram Prediction

3.4 Enhanced N-gram Relation Modeling

实验结果

消融实验

Effect of Explicitly N-gram MLM

Size of N-gram Lexicon

Effect of Comprehensive N-gram Prediction and Enhanced N-gram Relation Modeling

思考小结

简介

ERNIE-Gram, an explicitly n-gram masking and predicting method to eliminate the limitations of previous contiguously masking strategies and incorporate coarse-grained linguistic information into pre-training sufficiently. ERNIE-Gram conducts comprehensive n-gram pre- diction and relation modeling to further enhance the learning of semantic n-grams for pre-training.

主要出发点

BERT’s MLM focuses on the representations of fine-grained text units (e.g. words or subwords in English and characters in Chinese), rarely considering the coarse-grained linguistic information (e.g. named entities or phrases in English and words in Chinese) thus incurring inadequate representation learning.
Many efforts have been devoted to integrate coarse-grained semantic information by independently masking and predicting contiguous sequences of n tokens, namely n-grams, such as named entities, phrases (Sun et al., 2019b), whole words.
We argue that such contiguously masking strategies are less effective and reliable since the prediction of tokens in masked n-grams are independent of each other, which neglects the intra-dependencies of n-grams.

主要工作

3.2. Explicitly N-gram Masked Language

如上图f1(a): 之前的Contiguously MLM，忽略了ngram内部词之前的依赖关系，预测时ngram中的各个token之间是相互独立的，loss计算方式：

如上图f1(b): explicitly N-gram MLM，将ngram看成一个整体(token)（此处需额外一个ngram字典），预测时只需在一个位置预测，loss计算方式：

3.3 Comprehensive N-gram Prediction

更进一步的，该工作同时进行了ngram整体片段的预测和内部各个token的预测，作者对mask matrix进行了精心的设计，详见原文

3.4 Enhanced N-gram Relation Modeling

To explicitly learn the semantic relationships be- tween n-grams, we jointly pre-train a small genera- tor model θ′ with explicitly n-gram MLM objective to sample plausible n-gram identities. Then we employ the generated identities to preform mask- ing and train the standard model θ to predict the original n-grams from fake ones in coarse-grained and fine-grained manners, as shown in Figure 3(a), which is efficient to model the pair relationships between similar n-grams.
建模ngram之间的关系，借鉴了一部分ELECTRA的思想

实验结果

基本比较稳定的超过对比的ptm

消融实验

Effect of Explicitly N-gram MLM

Explicitly N-gram MLM 对于 contiguously mlm 的提升并没有想象的那么大，0.5左右

Size of N-gram Lexicon

Effect of Comprehensive N-gram Prediction and Enhanced N-gram Relation Modeling

貌似enrm的影响比cnp的影响更大

思考小结

整个工作感觉还是比较复杂的，看来想有效提升，刷榜还是很不容易的，不过总感觉不是那么丝滑，大道至简；
之前做相关项目的时候，自己对于ngram或span也是没有好的解决方式（想扩大字典将词包含进来），没想到其实粗暴的 contiguously mlm也有效果，但是 Explicitly N-gram MLM 对于 contiguously mlm 的提升并没有我想象的那么大（太天真）（另，侧面反映其实采用字级别的处理方式表现也还可以）

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2021-10-27 12:50:24 更:2021-10-27 12:52:23

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/29 3:08:08-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码