| |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
-> 人工智能 -> 【吴恩达深度学习】Building your Recurrent Neural Network - Step by Step -> 正文阅读 |
|
[人工智能]【吴恩达深度学习】Building your Recurrent Neural Network - Step by Step |
Building your Recurrent Neural Network - Step by StepWelcome to Course 5’s first assignment! In this assignment, you will implement your first Recurrent Neural Network in numpy. Recurrent Neural Networks (RNN) are very effective for Natural Language Processing and other sequence tasks because they have “memory”. They can read inputs x ? t ? x^{\langle t \rangle} x?t? (such as words) one at a time, and remember some information/context through the hidden layer activations that get passed from one time-step to the next. This allows a uni-directional RNN to take information from the past to process later inputs. A bidirection RNN can take context from both the past and the future. Notation:
We assume that you are already familiar with Let’s first import all the packages that you will need during this assignment.
1 - Forward propagation for the basic Recurrent Neural NetworkLater this week, you will generate music using an RNN. The basic RNN that you will implement has the structure below. In this example, T x = T y T_x = T_y Tx?=Ty?.
Steps:
Let’s go! 1.1 - RNN cellA Recurrent neural network can be seen as the repetition of a single cell. You are first going to implement the computations for a single time-step. The following figure describes the operations for a single time-step of an RNN cell.
Exercise: Implement the RNN-cell described in Figure (2). Instructions:
We will vectorize over m m m examples. Thus, x ? t ? x^{\langle t \rangle} x?t? will have dimension ( n x , m ) (n_x,m) (nx?,m), and a ? t ? a^{\langle t \rangle} a?t? will have dimension ( n a , m ) (n_a,m) (na?,m).
1.2 - RNN forward passYou can see an RNN as the repetition of the cell you’ve just built. If your input sequence of data is carried over 10 time steps, then you will copy the RNN cell 10 times. Each cell takes as input the hidden state from the previous cell ( a ? t ? 1 ? a^{\langle t-1 \rangle} a?t?1?) and the current time-step’s input data ( x ? t ? x^{\langle t \rangle} x?t?). It outputs a hidden state ( a ? t ? a^{\langle t \rangle} a?t?) and a prediction ( y ? t ? y^{\langle t \rangle} y?t?) for this time-step.
Exercise: Code the forward propagation of the RNN described in Figure (3). Instructions:
Congratulations! You’ve successfully built the forward propagation of a recurrent neural network from scratch. This will work well enough for some applications, but it suffers from vanishing gradient problems. So it works best when each output y ? t ? y^{\langle t \rangle} y?t? can be estimated using mainly “local” context (meaning information from inputs x ? t ′ ? x^{\langle t' \rangle} x?t′? where t ′ t' t′ is not too far from t t t). In the next part, you will build a more complex LSTM model, which is better at addressing vanishing gradients. The LSTM will be better able to remember a piece of information and keep it saved for many timesteps. 2 - Long Short-Term Memory (LSTM) networkThis following figure shows the operations of an LSTM-cell.
Similar to the RNN example above, you will start by implementing the LSTM cell for a single time-step. Then you can iteratively call it from inside a for-loop to have it process an input with T x T_x Tx? time-steps. About the gates- Forget gateFor the sake of this illustration, lets assume we are reading words in a piece of text, and want use an LSTM to keep track of grammatical structures, such as whether the subject is singular or plural. If the subject changes from a singular word to a plural word, we need to find a way to get rid of our previously stored memory value of the singular/plural state. In an LSTM, the forget gate lets us do this: Γ f ? t ? = σ ( W f [ a ? t ? 1 ? , x ? t ? ] + b f ) (1) \Gamma_f^{\langle t \rangle} = \sigma(W_f[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_f)\tag{1} Γf?t??=σ(Wf?[a?t?1?,x?t?]+bf?)(1) Here, W f W_f Wf? are weights that govern the forget gate’s behavior. We concatenate [ a ? t ? 1 ? , x ? t ? ] [a^{\langle t-1 \rangle}, x^{\langle t \rangle}] [a?t?1?,x?t?] and multiply by W f W_f Wf?. The equation above results in a vector Γ f ? t ? \Gamma_f^{\langle t \rangle} Γf?t?? with values between 0 and 1. This forget gate vector will be multiplied element-wise by the previous cell state c ? t ? 1 ? c^{\langle t-1 \rangle} c?t?1?. So if one of the values of Γ f ? t ? \Gamma_f^{\langle t \rangle} Γf?t?? is 0 (or close to 0) then it means that the LSTM should remove that piece of information (e.g. the singular subject) in the corresponding component of c ? t ? 1 ? c^{\langle t-1 \rangle} c?t?1?. If one of the values is 1, then it will keep the information. - Update gateOnce we forget that the subject being discussed is singular, we need to find a way to update it to reflect that the new subject is now plural. Here is the formulat for the update gate: Γ u ? t ? = σ ( W u [ a ? t ? 1 ? , x { t } ] + b u ) (2) \Gamma_u^{\langle t \rangle} = \sigma(W_u[a^{\langle t-1 \rangle}, x^{\{t\}}] + b_u)\tag{2} Γu?t??=σ(Wu?[a?t?1?,x{t}]+bu?)(2) Similar to the forget gate, here Γ u ? t ? \Gamma_u^{\langle t \rangle} Γu?t?? is again a vector of values between 0 and 1. This will be multiplied element-wise with c ~ ? t ? \tilde{c}^{\langle t \rangle} c~?t?, in order to compute c ? t ? c^{\langle t \rangle} c?t?. - Updating the cellTo update the new subject we need to create a new vector of numbers that we can add to our previous cell state. The equation we use is: c ~ ? t ? = tanh ? ( W c [ a ? t ? 1 ? , x ? t ? ] + b c ) (3) \tilde{c}^{\langle t \rangle} = \tanh(W_c[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_c)\tag{3} c~?t?=tanh(Wc?[a?t?1?,x?t?]+bc?)(3) Finally, the new cell state is: c ? t ? = Γ f ? t ? ? c ? t ? 1 ? + Γ u ? t ? ? c ~ ? t ? (4) c^{\langle t \rangle} = \Gamma_f^{\langle t \rangle}* c^{\langle t-1 \rangle} + \Gamma_u^{\langle t \rangle} *\tilde{c}^{\langle t \rangle} \tag{4} c?t?=Γf?t???c?t?1?+Γu?t???c~?t?(4) - Output gateTo decide which outputs we will use, we will use the following two formulas:
Γ
o
?
t
?
=
σ
(
W
o
[
a
?
t
?
1
?
,
x
?
t
?
]
+
b
o
)
(5)
\Gamma_o^{\langle t \rangle}= \sigma(W_o[a^{\langle t-1 \rangle}, x^{\langle t \rangle}] + b_o)\tag{5}
Γo?t??=σ(Wo?[a?t?1?,x?t?]+bo?)(5) Where in equation 5 you decide what to output using a sigmoid function and in equation 6 you multiply that by the tanh ? \tanh tanh of the previous state. 2.1 - LSTM cellExercise: Implement the LSTM cell described in the Figure (3). Instructions:
2.2 - Forward pass for LSTMNow that you have implemented one step of an LSTM, you can now iterate this over this using a for-loop to process a sequence of T x T_x Tx? inputs. Exercise: Implement Note: c ? 0 ? c^{\langle 0 \rangle} c?0? is initialized with zeros.
Congratulations! You have now implemented the forward passes for the basic RNN and the LSTM. When using a deep learning framework, implementing the forward pass is sufficient to build systems that achieve great performance. The rest of this notebook is optional, and will not be graded. 3 - Backpropagation in recurrent neural networks (OPTIONAL / UNGRADED)In modern deep learning frameworks, you only have to implement the forward pass, and the framework takes care of the backward pass, so most deep learning engineers do not need to bother with the details of the backward pass. If however you are an expert in calculus and want to see the details of backprop in RNNs, you can work through this optional portion of the notebook. When in an earlier course you implemented a simple (fully connected) neural network, you used backpropagation to compute the derivatives with respect to the cost to update the parameters. Similarly, in recurrent neural networks you can to calculate the derivatives with respect to the cost in order to update the parameters. The backprop equations are quite complicated and we did not derive them in lecture. However, we will briefly present them below. 3.1 - Basic RNN backward passWe will start by computing the backward pass for the basic RNN-cell.
Deriving the one step backward functions:To compute the The derivative of tanh ? \tanh tanh is 1 ? tanh ? ( x ) 2 1-\tanh(x)^2 1?tanh(x)2. You can find the complete proof here. Note that: sec ? ( x ) 2 = 1 ? tanh ? ( x ) 2 \sec(x)^2 = 1 - \tanh(x)^2 sec(x)2=1?tanh(x)2 Similarly for ? a ? t ? ? W a x , ? a ? t ? ? W a a , ? a ? t ? ? b \frac{ \partial a^{\langle t \rangle} } {\partial W_{ax}}, \frac{ \partial a^{\langle t \rangle} } {\partial W_{aa}}, \frac{ \partial a^{\langle t \rangle} } {\partial b} ?Wax??a?t??,?Waa??a?t??,?b?a?t??, the derivative of tanh ? ( u ) \tanh(u) tanh(u) is ( 1 ? tanh ? ( u ) 2 ) d u (1-\tanh(u)^2)du (1?tanh(u)2)du. The final two equations also follow same rule and are derived using the tanh ? \tanh tanh derivative. Note that the arrangement is done in a way to get the same dimensions to match.
Backward pass through the RNNComputing the gradients of the cost with respect to a ? t ? a^{\langle t \rangle} a?t? at every time-step t t t is useful because it is what helps the gradient backpropagate to the previous RNN-cell. To do so, you need to iterate through all the time steps starting at the end, and at each step, you increment the overall d b a db_a dba?, d W a a dW_{aa} dWaa?, d W a x dW_{ax} dWax? and you store d x dx dx. Instructions: Implement the
3.2 - LSTM backward pass3.2.1 One Step backwardThe LSTM backward pass is slighltly more complicated than the forward one. We have provided you with all the equations for the LSTM backward pass below. (If you enjoy calculus exercises feel free to try deriving these from scratch yourself.) 3.2.2 gate derivativesd Γ o ? t ? = d a n e x t ? tanh ? ( c n e x t ) ? Γ o ? t ? ? ( 1 ? Γ o ? t ? ) (7) d \Gamma_o^{\langle t \rangle} = da_{next}*\tanh(c_{next}) * \Gamma_o^{\langle t \rangle}*(1-\Gamma_o^{\langle t \rangle})\tag{7} dΓo?t??=danext??tanh(cnext?)?Γo?t???(1?Γo?t??)(7) d c ~ ? t ? = d c n e x t ? Γ i ? t ? + Γ o ? t ? ( 1 ? tanh ? ( c n e x t ) 2 ) ? i t ? d a n e x t ? c ~ ? t ? ? ( 1 ? ( c ~ ) 2 ) (8) d\tilde c^{\langle t \rangle} = dc_{next}*\Gamma_i^{\langle t \rangle}+ \Gamma_o^{\langle t \rangle} (1-\tanh(c_{next})^2) * i_t * da_{next} * \tilde c^{\langle t \rangle} * (1-(\tilde c)^2) \tag{8} dc~?t?=dcnext??Γi?t??+Γo?t??(1?tanh(cnext?)2)?it??danext??c~?t??(1?(c~)2)(8) d Γ u ? t ? = d c n e x t ? c ~ ? t ? + Γ o ? t ? ( 1 ? tanh ? ( c n e x t ) 2 ) ? c ~ ? t ? ? d a n e x t ? Γ u ? t ? ? ( 1 ? Γ u ? t ? ) (9) d\Gamma_u^{\langle t \rangle} = dc_{next}*\tilde c^{\langle t \rangle} + \Gamma_o^{\langle t \rangle} (1-\tanh(c_{next})^2) * \tilde c^{\langle t \rangle} * da_{next}*\Gamma_u^{\langle t \rangle}*(1-\Gamma_u^{\langle t \rangle})\tag{9} dΓu?t??=dcnext??c~?t?+Γo?t??(1?tanh(cnext?)2)?c~?t??danext??Γu?t???(1?Γu?t??)(9) d Γ f ? t ? = d c n e x t ? c ~ p r e v + Γ o ? t ? ( 1 ? tanh ? ( c n e x t ) 2 ) ? c p r e v ? d a n e x t ? Γ f ? t ? ? ( 1 ? Γ f ? t ? ) (10) d\Gamma_f^{\langle t \rangle} = dc_{next}*\tilde c_{prev} + \Gamma_o^{\langle t \rangle} (1-\tanh(c_{next})^2) * c_{prev} * da_{next}*\Gamma_f^{\langle t \rangle}*(1-\Gamma_f^{\langle t \rangle})\tag{10} dΓf?t??=dcnext??c~prev?+Γo?t??(1?tanh(cnext?)2)?cprev??danext??Γf?t???(1?Γf?t??)(10) 3.2.3 parameter derivatives
d
W
f
=
d
Γ
f
?
t
?
?
(
a
p
r
e
v
x
t
)
T
(11)
dW_f = d\Gamma_f^{\langle t \rangle} * \begin{pmatrix} a_{prev} \\ x_t\end{pmatrix}^T \tag{11}
dWf?=dΓf?t???(aprev?xt??)T(11) To calculate
d
b
f
,
d
b
u
,
d
b
c
,
d
b
o
db_f, db_u, db_c, db_o
dbf?,dbu?,dbc?,dbo? you just need to sum across the horizontal (axis= 1) axis on
d
Γ
f
?
t
?
,
d
Γ
u
?
t
?
,
d
c
~
?
t
?
,
d
Γ
o
?
t
?
d\Gamma_f^{\langle t \rangle}, d\Gamma_u^{\langle t \rangle}, d\tilde c^{\langle t \rangle}, d\Gamma_o^{\langle t \rangle}
dΓf?t??,dΓu?t??,dc~?t?,dΓo?t?? respectively. Note that you should have the Finally, you will compute the derivative with respect to the previous hidden state, previous memory state, and input.
d
a
p
r
e
v
=
W
f
T
?
d
Γ
f
?
t
?
+
W
u
T
?
d
Γ
u
?
t
?
+
W
c
T
?
d
c
~
?
t
?
+
W
o
T
?
d
Γ
o
?
t
?
(15)
da_{prev} = W_f^T*d\Gamma_f^{\langle t \rangle} + W_u^T * d\Gamma_u^{\langle t \rangle}+ W_c^T * d\tilde c^{\langle t \rangle} + W_o^T * d\Gamma_o^{\langle t \rangle} \tag{15}
daprev?=WfT??dΓf?t??+WuT??dΓu?t??+WcT??dc~?t?+WoT??dΓo?t??(15)
d
c
p
r
e
v
=
d
c
n
e
x
t
Γ
f
?
t
?
+
Γ
o
?
t
?
?
(
1
?
tanh
?
(
c
n
e
x
t
)
2
)
?
Γ
f
?
t
?
?
d
a
n
e
x
t
(16)
dc_{prev} = dc_{next}\Gamma_f^{\langle t \rangle} + \Gamma_o^{\langle t \rangle} * (1- \tanh(c_{next})^2)*\Gamma_f^{\langle t \rangle}*da_{next} \tag{16}
dcprev?=dcnext?Γf?t??+Γo?t???(1?tanh(cnext?)2)?Γf?t???danext?(16) Exercise: Implement 原文中导数计算用了一些小trick来简便运算,本博客代码实现采用如下计算方法,下面的公式更加便于理解。
d
W
f
=
d
Γ
f
<
t
>
?
Γ
f
<
t
>
?
(
1
?
Γ
f
<
t
>
)
×
[
a
<
t
?
1
>
,
x
<
t
>
]
dW_f=d\Gamma _{f}^{<t>}\cdot \Gamma _{f}^{<t>}\cdot \left( 1-\Gamma _{f}^{<t>} \right) \times \left[ a^{<t-1>},x^{<t>} \right]
dWf?=dΓf<t>??Γf<t>??(1?Γf<t>?)×[a<t?1>,x<t>]
d
b
f
=
∑
b
a
t
c
h
d
Γ
f
<
t
>
?
Γ
f
<
t
>
?
(
1
?
Γ
f
<
t
>
)
db_f=\sum_{batch}{d\Gamma _{f}^{<t>}\cdot \Gamma _{f}^{<t>}\cdot \left( 1-\Gamma _{f}^{<t>} \right)}
dbf?=batch∑?dΓf<t>??Γf<t>??(1?Γf<t>?) d a < t ? 1 > = W f T × ( d Γ f < t > ? Γ f < t > ? ( 1 ? Γ f < t > ) ) + W u T × ( d Γ u < t > ? Γ u < t > ? ( 1 ? Γ u < t > ) ) + W c T × ( d c ~ < t > ? ( 1 ? ( c ~ < t > ) 2 ) ) + W o T × ( d Γ o < t > ? Γ o < t > ? ( 1 ? Γ o < t > ) ) da^{<t-1>}=W_f^T\times \left( d\Gamma _{f}^{<t>}\cdot \Gamma _{f}^{<t>}\cdot \left( 1-\Gamma _{f}^{<t>} \right) \right) +W_{u}^{T}\times \left( d\Gamma _{u}^{<t>}\cdot \Gamma _{u}^{<t>}\cdot \left( 1-\Gamma _{u}^{<t>} \right) \right) +W_{c}^{T}\times \left( d\tilde{c}^{<t>}\cdot \left( 1-\left( \tilde{c}^{<t>} \right) ^2 \right) \right) +W_{o}^{T}\times \left( d\Gamma _{o}^{<t>}\cdot \Gamma _{o}^{<t>}\cdot \left( 1-\Gamma _{o}^{<t>} \right) \right) da<t?1>=WfT?×(dΓf<t>??Γf<t>??(1?Γf<t>?))+WuT?×(dΓu<t>??Γu<t>??(1?Γu<t>?))+WcT?×(dc~<t>?(1?(c~<t>)2))+WoT?×(dΓo<t>??Γo<t>??(1?Γo<t>?)) d c < t ? 1 > = ( d c < t > + d a < t > ? Γ o ( 1 ? tanh ? 2 ( c < t > ) ) ) ? Γ f < t > dc^{<t-1>}=\left( dc^{<t>}+da^{<t>}\cdot \Gamma _o\left( 1-\tanh^2\left( c^{<t>} \right) \right) \right) \cdot \Gamma _{f}^{<t>} dc<t?1>=(dc<t>+da<t>?Γo?(1?tanh2(c<t>)))?Γf<t>? d x < t > = W f T × ( d Γ f < t > ? Γ f < t > ? ( 1 ? Γ f < t > ) ) + W u T × ( d Γ u < t > ? Γ u < t > ? ( 1 ? Γ u < t > ) ) + W c T × ( d c ~ < t > ? ( 1 ? ( c ~ < t > ) 2 ) ) + W o T × ( d Γ o < t > ? Γ o < t > ? ( 1 ? Γ o < t > ) ) dx^{<t>}=W_{f}^{T}\times \left( d\Gamma _{f}^{<t>}\cdot \Gamma _{f}^{<t>}\cdot \left( 1-\Gamma _{f}^{<t>} \right) \right) +W_{u}^{T}\times \left( d\Gamma _{u}^{<t>}\cdot \Gamma _{u}^{<t>}\cdot \left( 1-\Gamma _{u}^{<t>} \right) \right) +W_{c}^{T}\times \left( d\tilde{c}^{<t>}\cdot \left( 1-\left( \tilde{c}^{<t>} \right) ^2 \right) \right) +W_{o}^{T}\times \left( d\Gamma _{o}^{<t>}\cdot \Gamma _{o}^{<t>}\cdot \left( 1-\Gamma _{o}^{<t>} \right) \right) dx<t>=WfT?×(dΓf<t>??Γf<t>??(1?Γf<t>?))+WuT?×(dΓu<t>??Γu<t>??(1?Γu<t>?))+WcT?×(dc~<t>?(1?(c~<t>)2))+WoT?×(dΓo<t>??Γo<t>??(1?Γo<t>?))
3.3 Backward pass through the LSTM RNNThis part is very similar to the Instructions: Implement the
原文中的期望输出有误。 Congratulations !Congratulations on completing this assignment. You now understand how recurrent neural networks work! Lets go on to the next exercise, where you’ll use an RNN to build a character-level language model. |
|
|
上一篇文章 下一篇文章 查看所有文章 |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 | -2025/1/8 4:36:08- |
|
网站联系: qq:121756557 email:121756557@qq.com IT数码 |