IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> 论文笔记:Enhanced LSTM for Natural Language Inference -> 正文阅读

[人工智能]论文笔记:Enhanced LSTM for Natural Language Inference

Enhanced LSTM for Natural Language Inference

https://arxiv.org/pdf/1609.06038v3.pdf

Related Work

  • Enhancing sequential inference models based on chain networks
  • Further, considering recursive architectures to encode syntactic parsing information

Hybrid Neural Inference Models

Major components

  • input encoding、local inference modeling、inference composition
  • ESIM(sequential NLI model)、Tree LSTM(incorporate syntactic parsing information)
    在这里插入图片描述
Notation
  • Two sentences:
    • a = ( a 1 , . . . , a l a ) a = (a_1, ..., a_{l_a}) a=(a1?,...,ala??)
    • b = ( b 1 , . . . , b l b ) b = (b_1, ..., b_{l_b}) b=(b1?,...,blb??)
  • Enbedding of l l l-dimensional vector: a i a_i ai? b j ∈ R l b_j\in \mathbb{R}^l bj?Rl
  • a ˉ i \bar {a}_i aˉi?: generated by the B i L S T M BiLSTM BiLSTM at time i i i over the input sequence a a a
Goal
  • Predict a label y y y that indicates the logic relationship between a a a and b b b

Input Encoding

  • Use B i L S T M BiLSTM BiLSTM to encode the input premise and hypothesis

  • Hidden states by two LSTMs at each time step are concatenated to represent that time step and its context

  • Encode syntactic parse trees of a premise and hypothesis through tree-LSTM

  • A tree node is deployed with a tree-LSTM memory block depicted

    • At each node, an input vector x t x_t xt? and hidden vectors of it( h t ? 1 L h^L_{t-1} ht?1L? and h t ? 1 R h^R_{t-1} ht?1R?)are taken in as the input to calculate the current node’s hidden vector h t h_t ht?
      在这里插入图片描述
  • Detailed computation:

    • h t = T r L S T M ( x t , h t ? 1 L , h t ? 1 R ) h_t=TrLSTM(x_t, h^L_{t-1}, h^R_{t-1}) ht?=TrLSTM(xt?,ht?1L?,ht?1R?)
    • h t = o t ⊙ t a n h ( c t ) h_t=o_t\odot tanh(c_t) ht?=ot?tanh(ct?)
    • o t = σ ( W o x t + U o L h t ? 1 L + U o R h t ? 1 R ) o_t=\sigma(W_ox_t+U^L_oh^L_{t-1}+U^R_oh^R_{t-1}) ot?=σ(Wo?xt?+UoL?ht?1L?+UoR?ht?1R?)
    • c t = f t T ⊙ c t ? 1 L + f t R ⊙ c t ? 1 R + i t ⊙ u t c_t=f_t^T \odot c^L_{t-1}+f^R_t\odot c^R_{t-1}+i_t\odot u_t ct?=ftT?ct?1L?+ftR?ct?1R?+it?ut?
    • f t L = σ ( W f x t + U f L L h t ? 1 L + U f L R h t ? 1 R ) f^L_t=\sigma(W_fx_t+U^{LL}_fh^L_{t-1}+U^{LR}_fh^R_{t-1}) ftL?=σ(Wf?xt?+UfLL?ht?1L?+UfLR?ht?1R?)
    • f t R = σ ( W f x t + U f R L h t ? 1 L + U f R R h t ? 1 R ) f^R_t=\sigma(W_fx_t+U^{RL}_fh^L_{t-1}+U^{RR}_fh^R_{t-1}) ftR?=σ(Wf?xt?+UfRL?ht?1L?+UfRR?ht?1R?)
    • i t = σ ( W i x t + U i L h t ? 1 L + U i R h t ? 1 R ) i_t=\sigma(W_ix_t+U^L_i h^L_{t-1}+U^R_ih^R_{t-1}) it?=σ(Wi?xt?+UiL?ht?1L?+UiR?ht?1R?)
    • u t = t a n h ( W c x t + U c L h t ? 1 L + U c R h t ? 1 R ) u_t=tanh(W_cx_t+U^L_ch^L_{t-1}+U^R_ch^R_{t-1}) ut?=tanh(Wc?xt?+UcL?ht?1L?+UcR?ht?1R?)
  • All W ∈ R d × l , U ∈ d × d W\in \mathbb{R}^{d\times l}, U\in\mathbb{d\times d} WRd×l,Ud×d are weight matrices to be learned

Local Inference Modeling

Locality of inference
  • Employ some forms of hard or soft alignment to associate the relevant subcomponents between a premise and a hypothesis
  • Argue for leveraging attention over the bidirectional sequential encoding of the input
  • soft alignment layer computes the attention weights as the similarity of a hidden state tuple < a ˉ i , b ˉ j > <\bar a_i,\bar b_j> <aˉi?,bˉj?> between a premise and a hypothesis with e i j = a ˉ i T b ˉ j e_{ij}= \bar {a}^T_i \bar b_j eij?=aˉiT?bˉj?
  • use bidirectional LSTM and tree-LSTM to encode the premise and hypothesis
  • In sequential inference model, use BiLSTM
Local inference collected over sequences
  • Local inference is determined by the attentiion weight e i j e_{ij} eij?, which is used to obtain the local relevance between a premise and hypothesis
  • The content in { b ˉ j } j = 1 l b {\{\bar b_j\}}^{l_b}_{j=1} {bˉj?}j=1lb?? that is relevant to a ˉ i \bar a_i aˉi? will be selected and represented as a ~ i \tilde a_i a~i?

a ~ i = ∑ j = 1 l b e x p ( e i j ) ∑ k = 1 l b e x p ( e i k ) b ˉ j , ? i ∈ [ 1 , . . . , l a ] \tilde a_i =\sum\limits_{j=1}^{l_b}\frac{exp(e_{ij})}{\sum^{l_b}_{k=1}exp(e_{ik})}\bar b_j, \forall i \in[1,...,l_a] a~i?=j=1lb??k=1lb??exp(eik?)exp(eij?)?bˉj?,?i[1,...,la?]

b ~ j = ∑ i = 1 l a e x p ( e i j ) ∑ k = 1 l a e x p ( e k j ) a ˉ i , ? j ∈ [ 1 , . . . , l b ] \tilde b_j =\sum\limits_{i=1}^{l_a}\frac{exp(e_{ij})}{\sum^{l_a}_{k=1}exp(e_{kj})}\bar a_i, \forall j \in[1,...,l_b] b~j?=i=1la??k=1la??exp(ekj?)exp(eij?)?aˉi?,?j[1,...,lb?]

Local inference collected over parse trees
  • compute the difference and the element-wise product for the tuple < a ˉ , a ~ > <\bar a, \tilde a> <aˉ,a~>as well as for
    < b ˉ , b ~ > <\bar b, \tilde b> <bˉ,b~>
  • The difference and element-wise product are then concatenated with the original vectors

m a = [ a ˉ ; a ~ ; a ˉ ? a ~ ; a ˉ ⊙ a ~ ; ] m_a=[\bar a;\tilde a;\bar a-\tilde a;\bar a \odot \tilde a;] ma?=[aˉ;a~;aˉ?a~;aˉa~;]

m b = [ b ˉ ; b ~ ; b ˉ ? b ~ ; b ˉ ⊙ b ~ ; ] m_b=[\bar b;\tilde b;\bar b-\tilde b;\bar b \odot \tilde b;] mb?=[bˉ;b~;bˉ?b~;bˉb~;]

Inference Composition

  • Explore a composition layer to compose the enhanced local inference information m a m_a ma? and m b m_b mb?
The composition layer
  • In sequential inference model, use BiLSTM to compose local inference information sequentially
  • Formulas for BiLSTM are used to capture local inference information m a m_a ma? and m b m_b mb? and their context here for inference composition
  • In the tree composition, a tree node updates to compose local inference

v a , t = T r L S T M ( F ( m a , t ) , h t ? 1 L , h t ? 1 R ) v_{a,t}=TrLSTM(F(m_{a,t}), h^L_{t-1}, h^R_{t-1}) va,t?=TrLSTM(F(ma,t?),ht?1L?,ht?1R?)

v b , t = T r L S T M ( F ( m b , t ) , h t ? 1 L , h t ? 1 R ) v_{b,t}=TrLSTM(F(m_{b,t}), h^L_{t-1}, h^R_{t-1}) vb,t?=TrLSTM(F(mb,t?),ht?1L?,ht?1R?)

  • Use a 1-layer feedforward neural network with the ReLU activation, which is also applied to BiLSTM in sequential inference composition
Pooling
  • Convert the resulting vectors obtained above to a fixed-length vector with pooling and feeds it to the final classifier to determine the overall inference relationship
  • Compute both average and max pooling, and concatenate all these vectors to form the final fixed length vector v v v

v a , a v e = ∑ i = 1 l a v a , i l a v_{a,ave}=\sum\limits_{i=1}^{l_a}\frac{v_{a,i}}{l_a} va,ave?=i=1la??la?va,i??, v a , m a x = max ? i = 1 l a v a , i v_{a,max}=\max\limits_{i=1}^{l_a}v_{a,i} va,max?=i=1maxla??va,i?

v b , a v e = ∑ j = 1 l b v b , j l b v_{b,ave}=\sum\limits_{j=1}^{l_b}\frac{v_{b,j}}{l_b} vb,ave?=j=1lb??lb?vb,j??, v b , m a x = max ? j = 1 l b v b , j v_{b,max}=\max\limits_{j=1}^{l_b}v_{b,j} vb,max?=j=1maxlb??vb,j?

v = [ v a , a v e ; v a , m a x ; v b , a v e ; v b , m a x ] v =[v_{a,ave};v_{a,max};v_{b,ave};v_{b,max}] v=[va,ave?;va,max?;vb,ave?;vb,max?]

  • Put v v v into a final multilayer perceptron(MLP) classifier
  • Use multi-class cross-entropy loss
  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2021-07-14 10:51:52  更:2021-07-14 10:56:39 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年4日历 -2024/4/28 18:46:17-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码