Pytorch GRU网络前向传递/Python实现(可运行)
一、背景
? 对于训练好的神经网络网络模型,实际使用时,只需要进行前向传递的计算过程即可,而不需要考虑反向传播过程。对于一些Hybrid模型如rnnoise降噪算法来说,为了将算法落地,需要在一些低功耗设备上进行神经网络的运算,这时候往往需要使用C语言。本文是个人的笔记,将简单介绍如何将GRU网络部署在C语言上。
二、Pytorch中的GRU网络信息
? GRU网络的具体原理本文不再赘述,下面看看pytorch中GRU网络的计算过程:
r
t
=
σ
(
W
i
r
x
t
+
b
i
r
+
W
h
r
h
(
t
?
1
)
+
b
h
r
)
z
t
=
σ
(
W
i
z
x
t
+
b
i
z
+
W
h
z
h
(
t
?
1
)
+
b
h
z
r
t
=
t
a
n
h
(
W
i
n
x
t
+
b
i
n
+
r
t
?
(
W
h
n
h
(
t
?
1
)
+
b
h
r
)
)
h
t
=
(
1
?
z
t
)
?
n
t
+
z
t
?
h
(
t
?
1
)
r_t=\sigma(W_{ir}x_t+b_{ir}+W_{hr}h_{(t-1)}+b_{hr})\\ z_t=\sigma(W_{iz}x_t+b_{iz}+W_{hz}h_{(t-1)}+b_{hz}\\ r_t=tanh(W_{in}x_t+b_{in}+r_t*(W_{hn}h_{(t-1)}+b_{hr}))\\ h_t=(1-z_t)*n_t+z_t*h_{(t-1)}
rt?=σ(Wir?xt?+bir?+Whr?h(t?1)?+bhr?)zt?=σ(Wiz?xt?+biz?+Whz?h(t?1)?+bhz?rt?=tanh(Win?xt?+bin?+rt??(Whn?h(t?1)?+bhr?))ht?=(1?zt?)?nt?+zt??h(t?1)? 上面式子中
σ
\sigma
σ便是sigmoid函数,
x
t
x_t
xt?表示发当前时刻的输入,
h
t
?
1
h_{t-1}
ht?1?表示上一时刻的隐状态。同时可以看出,共有6个权重项
W
W
W与六个bias项
b
b
b。Pytorch对参数信息的介绍如下:
- ~GRU.weight_ih_l[k] – the learnable input-hidden weights of the
k
t
h
k^{th}
kth layer (W_ir|W_iz|W_in), of shape (3hidden_size, input_size) for k = 0. Otherwise, the shape is (3hidden_size, num_directions * hidden_size)
- ~GRU.weight_hh_l[k] – the learnable hidden-hidden weights of the
k
t
h
k^{th}
kth layer (W_hr|W_hz|W_hn), of shape (3*hidden_size, hidden_size)
- ~GRU.bias_ih_l[k] – the learnable input-hidden bias of the
k
t
h
k^{th}
kth layer (b_ir|b_iz|b_in), of shape (3*hidden_size)
- ~GRU.bias_hh_l[k] – the learnable hidden-hidden bias of the
k
t
h
k^{th}
kth layer (b_hr|b_hz|b_hn), of shape (3*hidden_size)
只考虑单层,可以看出,对于每一层GRU网络,参数都由weight_ih, weight_hh, bias_ih和bias_hh四部分组成。以weight_ih和bias_ih为例:weight_ih是一个(3hidden_size, input_size)形状的矩阵,它由W_ir, W_iz, W_in这三个(input_size, hidden_size)形状的矩阵在dim0方向叠加而成。bias_ih是一个(3hidden_size, 1)形状的矩阵,它由b_ir, b_iz, b_in这三个(hidden_size, 1)形状的矩阵在dim0方向叠加而成。
x
t
x_t
xt?作为输入其形状为(input_size, 1),因此
W
i
r
x
t
+
b
i
r
W_{ir}x_t+b_{ir}
Wir?xt?+bir?最终是一个(hidden_size, 1)的矩阵。类似的道理:
W
h
r
h
(
t
?
1
)
+
b
h
r
W_{hr}h_{(t-1)}+b_{hr}
Whr?h(t?1)?+bhr?的结果也是一个(hidden_size, 1)的矩阵,后面计算
z
t
z_t
zt?和
r
t
r_t
rt?情况完全相同。
三、代码演示
-
网络参数提取 下面的代码构建了一个input_size=10, hidden_size=5的单层GRU网络(batch_first=True),初始参数随机,我们对网络参数进行输出: import torch
from torch import nn
import torch.nn.functional as F
class GRUtest(nn.Module):
def __init__(self, input, hidden, act):
super().__init__()
self.gru = nn.GRU(input, hidden, batch_first=True)
if act == 'sigmoid':
self.act = nn.Sigmoid()
elif act == 'tanh':
self.act = nn.Tanh()
elif act == 'relu':
self.act = nn.ReLU()
def forward(self, x):
self.gru.flatten_parameters()
gru_out, gru_state = self.gru(x)
return gru_out, gru_state
if __name__ == '__main__':
insize = 10
hsize = 5
net1 = GRUtest(insize, hsize, 'tanh')
for name, parameters in net1.named_parameters():
print(name)
print(parameters)
运行结果如下: gru.weight_ih_l0
Parameter containing:
tensor([[-0.2723, 0.3715, 0.2461, 0.1564, -0.3429, 0.3451, 0.1402, 0.3094,
-0.1759, 0.0948],
...
[-0.2211, -0.3684, 0.1786, -0.0130, -0.0834, -0.0744, -0.3496, 0.1268,
0.0111, -0.3086]], requires_grad=True)
gru.weight_hh_l0
Parameter containing:
tensor([[ 0.1683, -0.0090, -0.4325, 0.2406, 0.2392],
...
[ 0.1703, 0.3895, 0.1127, -0.1311, 0.1465],
[-0.0391, -0.3496, -0.1727, 0.2034, 0.0147]], requires_grad=True)
gru.bias_ih_l0
Parameter containing:
tensor([ 0.1650, -0.2618, 0.4228, -0.1866, 0.0954, -0.2185, -0.2157, 0.2003,
-0.1248, -0.2836, -0.1828, 0.3261, 0.2692, 0.2722, -0.3817],
requires_grad=True)
gru.bias_hh_l0
Parameter containing:
tensor([ 0.2106, 0.1117, -0.3007, 0.0141, 0.0894, -0.2416, -0.1887, 0.3648,
-0.0361, -0.0047, -0.2830, -0.2674, 0.4117, 0.1664, -0.0708],
requires_grad=True)
可以看出输出恰好就是四个矩阵,分别对应上面提到的weight_ih, weight_hh, bias_ih, bias_hh -
前向计算python代码 为了验证计算结果,我们首先将一个随机的生成的GRU网络的参数输出并保存下来,接着使用pytorch自带的load函数加载模型、利用输出的参数自己写前向函数,比较这两种方法的结果。有一点需要注意:GRU没有输出门,也即对于某一层GRU网络而言,当
x
t
x_t
xt?进入网络后,经过一系列计算,隐状态
h
t
?
1
h_{t-1}
ht?1?被更新为
h
t
h_t
ht?,
h
t
h_t
ht?就是这一层的输出,将每一时刻的
h
h
h拼接在一起就是GRU网络的总输出。代码如下: import torch
from torch import nn
import numpy as np
import torch.nn.functional as F
weight_ih = torch.tensor([[ 0.3162, 0.0833, 0.1223, 0.4317, -0.2017, 0.1417, -0.1990, 0.3196,
0.3572, -0.4123],
[ 0.3818, 0.2136, 0.1949, 0.1841, 0.3718, -0.0590, -0.3782, -0.1283,
-0.3150, 0.0296],
[-0.0835, -0.2399, -0.0407, 0.4237, -0.0353, 0.0142, -0.0697, 0.0703,
0.3985, 0.2735],
[ 0.1587, 0.0972, 0.1054, 0.1728, -0.0578, -0.4156, -0.2766, 0.3817,
0.0267, -0.3623],
[ 0.0705, 0.3695, -0.4226, -0.3011, -0.1781, 0.0180, -0.1043, -0.0491,
-0.4360, 0.2094],
[ 0.3925, 0.2734, -0.3167, -0.3605, 0.1857, 0.0100, 0.1833, -0.4370,
-0.0267, 0.3154],
[ 0.2075, 0.0163, 0.0879, -0.0423, -0.2459, -0.1690, -0.2723, 0.3715,
0.2461, 0.1564],
[-0.3429, 0.3451, 0.1402, 0.3094, -0.1759, 0.0948, 0.4367, 0.3008,
0.3587, -0.0939],
[ 0.3407, -0.3503, 0.0387, -0.2518, -0.1043, -0.1145, 0.0335, 0.4070,
0.2214, -0.0019],
[ 0.3175, -0.2292, 0.2305, -0.0415, -0.0778, 0.0524, -0.3426, 0.0517,
0.1504, 0.3823],
[-0.1392, 0.1610, 0.4470, -0.1918, 0.4251, -0.2220, 0.1971, 0.1752,
0.1249, 0.3537],
[-0.1807, 0.1175, 0.0025, -0.3364, -0.1086, -0.2987, 0.1977, 0.0402,
0.0438, -0.1357],
[ 0.0022, -0.1391, 0.1285, 0.4343, 0.0677, -0.1981, -0.2732, 0.0342,
-0.3318, -0.3361],
[-0.2911, -0.1519, 0.0331, 0.3080, 0.1732, 0.3426, -0.2808, 0.0377,
-0.3975, 0.2565],
[ 0.0932, 0.4326, -0.3181, 0.3586, 0.3775, 0.3616, 0.0638, 0.4066,
0.2987, 0.3337]])
weight_hh = torch.tensor([[-0.0291, -0.3432, -0.0056, 0.0839, -0.3046],
[-0.2565, -0.4288, -0.1568, 0.3896, 0.0765],
[-0.0273, 0.0180, 0.2789, -0.3949, -0.3451],
[-0.1487, -0.2574, 0.2307, 0.3160, -0.4339],
[-0.3795, -0.4355, 0.1687, 0.3599, -0.3467],
[-0.2070, 0.1423, -0.2920, 0.3799, 0.1043],
[-0.1245, 0.0290, 0.1394, -0.1581, -0.3465],
[ 0.0030, 0.0081, 0.0090, -0.0653, 0.2871],
[-0.1248, -0.0433, 0.1839, -0.2815, 0.1197],
[-0.0989, 0.2145, -0.2426, 0.0165, 0.0438],
[-0.3598, -0.3252, 0.1715, -0.1302, 0.2656],
[-0.4418, -0.2211, -0.3684, 0.1786, -0.0130],
[-0.0834, -0.0744, -0.3496, 0.1268, 0.0111],
[-0.3086, 0.1683, -0.0090, -0.4325, 0.2406],
[ 0.2392, -0.0843, -0.3088, 0.0180, 0.3375]])
bias_ih = torch.tensor([ 0.4094, -0.3376, -0.2020, 0.3482, 0.2186, 0.2768, -0.2226, 0.3853,
-0.3676, -0.0215, 0.0093, 0.0751, -0.3375, 0.4103, 0.4395])
bias_hh = torch.tensor([-0.3088, 0.0165, -0.2382, 0.4288, 0.2494, 0.2634, 0.1443, -0.0445,
0.2518, 0.0076, -0.1631, 0.2309, 0.1403, -0.1159, -0.1226])
class GRUtest(nn.Module):
def __init__(self, input, hidden, act):
super().__init__()
self.gru = nn.GRU(input, hidden, batch_first=True)
if act == 'sigmoid':
self.act = nn.Sigmoid()
elif act == 'tanh':
self.act = nn.Tanh()
elif act == 'relu':
self.act = nn.ReLU()
def forward(self, x):
self.gru.flatten_parameters()
gru_out, gru_state = self.gru(x)
return gru_out, gru_state
class GRULayer:
def __init__(self, input_size, hidden_size, act):
self.bias_ih = bias_ih.reshape(-1)
self.bias_hh = bias_hh.reshape(-1)
self.weight_ih = weight_ih.reshape(-1)
self.weight_hh = weight_hh.reshape(-1)
self.nb_input = input_size
self.nb_neurons = hidden_size
self.activation = act
def compute_gru(gru, state, input):
M = gru.nb_input
N = gru.nb_neurons
r = torch.zeros(N)
z = torch.zeros(N)
n = torch.zeros(N)
h_new = torch.zeros(N)
for i in range(N):
sum = gru.bias_ih[0*N + i] + gru.bias_hh[0*N + i]
for j in range(M):
sum += input[j] * gru.weight_ih[0*M*N + i*M + j]
for j in range(N):
sum += state[j] * gru.weight_hh[0*N*N + i*N + j]
r[i] = torch.sigmoid(sum)
for i in range(N):
sum = gru.bias_ih[1*N+i] + gru.bias_hh[1*N+i]
for j in range(M):
sum += input[j] * gru.weight_ih[1*M*N + i*M + j]
for j in range(N):
sum += state[j] * gru.weight_hh[1*N*N + i*N + j]
z[i] = torch.sigmoid(sum)
for i in range(N):
sum = 0
sum += gru.bias_ih[2*N+i]
tmp = 0
for j in range(M):
sum += input[j] * gru.weight_ih[2*M*N + i*M + j]
for j in range(N):
tmp += state[j] * gru.weight_hh[2*N*N + i*N + j]
sum += r[i]*(tmp + gru.bias_hh[2*N+i])
n[i] = torch.tanh(sum)
for i in range(N):
h_new[i] = (1 - z[i]) * n[i] + z[i] * state[i]
state[i] = h_new[i]
b = torch.randn((1, 5, 10))
if __name__ == '__main__':
insize = 10
hsize = 5
net1 = GRUtest(insize, hsize, 'tanh')
model_ckpt1 = torch.load('./nn_test.pkl')
net1.load_state_dict(model_ckpt1.state_dict())
gru = GRULayer(insize, hsize, 'tanh')
out = torch.zeros((5, 5))
state = torch.zeros(5)
for i in range(5):
input = b[0][i]
compute_gru(gru, state, input)
out[i] = state
print("自己实现前向计算结果:")
print(out)
print("pytorch实现前向计算结果:")
torch_out, _ = net1(b)
print(torch_out)
计算结果为: 自己实现前向计算结果:
tensor([[-0.1810, 0.1028, -0.2076, -0.0975, 0.1328],
[-0.2521, -0.4217, 0.1996, 0.4948, 0.2553],
[-0.1471, 0.2741, 0.0375, -0.1926, -0.1080],
[-0.7646, 0.0691, -0.1276, 0.0147, -0.0271],
[-0.6323, 0.1059, 0.0936, 0.1193, -0.2436]])
pytorch实现前向计算结果:
tensor([[[-0.1810, 0.1028, -0.2076, -0.0975, 0.1328],
[-0.2522, -0.4217, 0.1996, 0.4948, 0.2553],
[-0.1471, 0.2741, 0.0375, -0.1926, -0.1079],
[-0.7646, 0.0691, -0.1276, 0.0147, -0.0271],
[-0.6323, 0.1059, 0.0937, 0.1193, -0.2436]]],
grad_fn=<TransposeBackward1>)
可以看出,结果十分一致
四、拓展到C语言
按照上述python代码容易写出。本人代码正在整理,待更新
五、补充说明
模型nn_test.pkl见百度网盘:https://pan.baidu.com/s/1wu-i_1X1YuDJygcxPsKi2w 提取码:razn
|