先说结论
结论
对于单向LSTM
h_n = output[:,-1,:]
对于双向LSTM来说,
output[:batch_size, -1, :hidden_szie*2/2] = h_n[0,:,:] #其中2表示bidirection方向为2
output[:batch_size, 0, hidden_szie*2/2:] = h_n[1,:,:] #其中2表示bidirection方向为2
output_size:[batch_size, sequence_lenght, hidden_szie*directional]
hn_size:[num_layers*directional, batch_size, hiddensize]
下面以双向LSTM举例说明上面的公式,单向LSTM由此推出来更简单。
import torch
import torch.nn as nn
rnn = nn.LSTM(input_size=2, hidden_size=3, batch_first=True, bidirectional=True)
input = torch.randn(5,4,2)
output, (hn, cn) = rnn(input)
output.shape
Out[7]: torch.Size([5, 4, 6])
hn.shape
Out[9]: torch.Size([2, 5, 3])
output
hn
下图相同颜色框对应同样的数据,可以得出下面的结论
hn[0,:,:] == output[:,-1,:3]
'''
hn[0,:,:]:
tensor([[ 0.4454, 0.2846, 0.1676],
[ 0.0321, 0.0556, -0.0778],
[ 0.1879, 0.1303, -0.1425],
[ 0.0959, 0.1590, 0.0034],
[ 0.1305, 0.1002, 0.0781]], grad_fn=<SliceBackward0>)
'''
'''
output[:,-1,:3]:
tensor([[ 0.4454, 0.2846, 0.1676],
[ 0.0321, 0.0556, -0.0778],
[ 0.1879, 0.1303, -0.1425],
[ 0.0959, 0.1590, 0.0034],
[ 0.1305, 0.1002, 0.0781]], grad_fn=<SliceBackward0>)
'''
输出的output和h_n的数据如下所示
|