IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> 【吴恩达深度学习】05_week3_quiz Sequence models & Attention mechanism -> 正文阅读

[人工智能]【吴恩达深度学习】05_week3_quiz Sequence models & Attention mechanism

(1)Consider using this encoder-decoder model for machine translation.
在这里插入图片描述
This model is a “conditional language model” in the sense that the encoder portion (shown in green) is modeling the probability of the input sentence x.
[A]True
[B]False

答案:B
解析:输入的是句子x的特征,不是概率。

(2)In beam search, if you increase the beam width B, which of the following would you expect to be true? Check all that apply.
[A]Beam search will run more slowly.
[B]Beam search will use up more memory.
[C]Beam search will generally find better solutions (i.e. do a better job maximizing P(y|x) )
[D]Beam search will converge after fewer steps.

答案:A,B,C
解析:beam search 是每一步选择B个概率最大的,B越大,则选择的句子越多,运行的也越慢,内存消耗也越多,但是得到的结果会更好。

(3)In machine translation, if we carry out beam search without using sentence normalization, the algorithm will tend to output overly short translations.
[A]True
[B]False

答案:A
解析:beam search需要最大化 ∏ t = 1 T y P ( y < t > ∣ x , y < 1 > , . . . , y < t ? 1 > ) \prod_{t=1}^{T_y}{P\left( y^{<t>}|x,y^{<1>},...,y^{<t-1>} \right)} t=1Ty??P(y<t>x,y<1>,...,y<t?1>)
其中每一项都是小于1的,所以越乘概率会越小,在没有归一化的情况下,通常短句子的概率更大些。

(4)Suppose you are building a speech recognition system, which uses an RNN model to map from audio clip x x x to a text transcript y y y. Your algorithm uses beam search to try to find the value of y y y that maximizes P ( y ∣ x ) P(y|x) P(yx).
On a dev set example, given an input audio clip, your algorithm outputs the transcript y ^ = " I ′ m ? b u i l d i n g ? a n ? A ? E y e ? s y s t e m ? i n ? S i l l y ? c o n ? V a l l e y . " \hat{y}="I'm\ building\ an\ A\ Eye\ system\ in\ Silly\ con\ Valley." y^?="Im?building?an?A?Eye?system?in?Silly?con?Valley.", whereas a human gives a much superior transcript y ? = " I ′ m ? b u i l d i n g ? a n ? A I ? s y s t e m ? i n ? S i l i c o n ? V a l l e y . " y^{*}="I'm\ building\ an\ AI\ system\ in\ Silicon\ Valley." y?="Im?building?an?AI?system?in?Silicon?Valley."
According to your model,
P ( y ^ ∣ x ) = 1.09 ? 1 0 ? 7 P(\hat{y}|x)=1.09*10^{-7} P(y^?x)=1.09?10?7
P ( y ? ∣ x ) = 7.21 ? 1 0 ? 8 P(y^{*}|x)=7.21*10^{-8} P(y?x)=7.21?10?8
Would you expect increasing the beam width B to help correct this example?
[A]No, because P ( y ? ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(y?x)P(y^?x) indicates the error should be attributed to the RNN rather than to the search algorithm.
[B]No, because P ( y ? ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(y?x)P(y^?x) indicates the error should be attributed to the search algorithm rather than to the RNN.
[C]Yes, because P ( y ? ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(y?x)P(y^?x) indicates the error should be attributed to the RNN rather than to the search algorithm.
[D]Yes, because P ( y ? ∣ x ) ≤ P ( y ^ ∣ x ) P(y^{*}|x) \leq P(\hat{y}|x) P(y?x)P(y^?x) indicates the error should be attributed to the search algorithm rather than to the RNN.

答案:A
解析:见3.5 Error analysis in beam search

(5)Continuing the example from Q4, suppose you work on your algorithm for a few more weeks, and now find that for the vast majority of examples on which your algorithm makes a mistake, P ( y ? ∣ x ) > P ( y ^ ∣ x ) P(y^{*}|x) > P(\hat{y}|x) P(y?x)>P(y^?x). This suggest you should focus your attention on improving the search algorithm.
[A]True
[B]False

答案:A

(6)Consider the attention model for machine translation.
在这里插入图片描述
Further, here is the formula for α < t , t ′ > \alpha ^{<t,t'>} α<t,t>
α < t , t ′ > = exp ? ( e < t , t ′ > ) ∑ t ′ = 1 T x exp ? ( e < t , t ′ > ) \alpha ^{<t,t'>}=\frac{\exp \left( e^{<t,t'>} \right)}{\sum_{t'=1}^{Tx}{\exp \left( e^{<t,t'>} \right)}} α<t,t>=t=1Tx?exp(e<t,t>)exp(e<t,t>)?
Which of the following statements about α < t , t ′ > \alpha ^{<t,t'>} α<t,t> are true? Check all that apply.
[A]We expect α < t , t ′ > \alpha ^{<t,t'>} α<t,t> to be generally larger for value of a < t ′ > a^{<t'>} a<t> that are highly relevant to the value the network should output for y < t > y^{<t>} y<t>. (Note the indices in the superscripts.)
[B]We expect α < t , t ′ > \alpha ^{<t,t'>} α<t,t> to be generally larger for value of a < t > a^{<t>} a<t> that are highly relevant to the value the network should output for y < t ′ > y^{<t'>} y<t>. (Note the indices in the superscripts.)
[C] ∑ t a < t , t ′ > = 1 \sum_t{a^{<t,t'>}}=1 t?a<t,t>=1 (Note the summation is over t)
[D] ∑ t ′ a < t , t ′ > = 1 \sum_{t'}{a^{<t,t'>}}=1 t?a<t,t>=1 (Note the summation is over t’)

答案:A,D

(7)The network learns where to “pay attention” by learning the values e < t , t ′ > e^{<t,t'>} e<t,t>, which are computed using a small neural network:
We can’t replace s < t ? 1 > s^{<t-1>} s<t?1> with s < t > s^{<t>} s<t> as an input to this neural network. This is because s < t > s^{<t>} s<t> depends on α < t , t ′ > \alpha ^{<t,t'>} α<t,t> which in turn depends on e < t , t ′ > e^{<t,t'>} e<t,t>; so at the time we need to evaluate this network, we haven’t computed s < t > s^{<t>} s<t> yet.
[A]True
[B]False

答案:A
在这里插入图片描述

(8)Compared to the encoder-decoder model shown in Question 1 of this quiz (which does not use an attention mechanism), we expect the attention model to have the greatest advantage when:
[A]The input sequence length T x Tx Tx is large.
[B]The input sequence length T x Tx Tx is small.

答案:A
解析:
在这里插入图片描述
绿色是加入注意力机制以后的Bleu 评分,可以看到对于长句子,加入注意力机制能有效的提升翻译的准确性。

(9)Under the CTC model, identical repeated characters not separated by the “blank” character(_) are collapsed. Under the CTC model, what dpes the following string collapse to?
__coo_o_kk___b_ooooo__oo_kkk
[A]cokbok
[B]cookbook
[C]cook book
[D]coookkboooooookkk
答案:B
解析:CTC 损失函数的一个基本规则是将空白符之间的重复的字符折叠起来。

(10)in trigger word detection, x < t > x^{<t>} x<t> is:
[A]Features of the audio (such as spectrogram features) at time t.
[B]The t-th input word, represented as either a one-hot vector or a word embedding.
[C]Whether the trigger word is being said at time t.
[D]Whether someone has just finished saying the trigger word at time t.

答案:A
解析:见3.10 Trigger Word Detection

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2022-04-18 17:43:23  更:2022-04-18 17:44:19 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 -2025/1/8 3:16:26-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码