写在前面
经过周末的休息,脑子得到了足够的休息,所以今天调程序的时候就特别顺利,完成了标准BP神经网络的程序搭建,并且开始了参数的调整,那么今天这篇博客主要分析程序实现以及模型调优的尝试过程。
神经网络
最终我确定下来的神经网络长这样:
相较于之前的模型,我将神经网络的输入层增加到了2个神经元,一个是图像的熵率,一个是图像的颜色距离,两者的计算方法如下图所示:
def calcuEntImagedist(x):
x_out = np.zeros(((1, 2)), dtype=float)
gray_temp = np.zeros((1, 256), dtype=float)
for i in range(0, 784):
temp = int(x[i])
gray_temp[0][temp] = gray_temp[0][temp] + 1
sum_tmp = float(0)
for i in range(0, 256):
gray_temp[0][i] = gray_temp[0][i] / 784
result = float(0)
for i in range(0, 256):
if(gray_temp[0][i] > 0):
result = result - gray_temp[0][i] * np.log2(gray_temp[0][i])
# print(result)
for i in range(0, 784):
sum_tmp = sum_tmp + x[i]
sum_tmp = sum_tmp / 784
temp_s = float(0)
for i in range(0 ,784):
temp_s = temp_s + pow((x[i] - sum_tmp), 3)
temp_s = pow((temp_s/784),(1/3))
# print(temp_s)
x_out[0][0] = result
x_out[0][1] = temp_s
# print("x_out:", x_out)
return x_out
熵率的计算方法是先遍历整张图,求出每个像素点的数值,然后计算256个像素点出现的概率,最后计算熵率。图像间像素的距离就是先求像素均值,再求各像素点与均值的距离。
公式推导
修改之后的神经网络公式如下图所示,首先是输入层到输出层的传递公式:
下面是输出层到输入层的梯度表示:
输入层到输出层的程序实现如下图所示,主要通过矩阵乘法实现:
m = np.dot(input_x, w) - theta_1
for i in range(0, 4):
m[0][i] = sigmoid(m[0][i])
n = np.dot(m, v) - theta_2
for i in range(0, 4):
n[0][i] = sigmoid(n[0][i])
y_out = np.dot(n, gamma) - theta_3
y_out = sigmoid(y_out)
输出层到输入层的梯度求导公式的程序实现如下图所示,之前尝试换了一个激励函数ReLU,但是后来发现激励函数换了的话整个梯度求导公式都要换,太麻烦了,所以我还是用回了sigmoid(x):
dtheta_3 = -1 * (y_out - y) * (y_out * (1 - y_out))
for i in range(0, 4):
dtheta_2[0][i] = -1 * (y_out - y) * (y_out * (1 - y_out)) * gamma[i] * \
(n[0][i] * (1 - n[0][i]))
dgamma[i] = (y_out - y) * (y_out * (1 - y_out)) * n[0][i]
for i in range(0, 4):
for j in range(0, 4):
dv[i][j] = (y_out - y) * (y_out * (1 - y_out)) * (n[0][j] * (1 - n[0][j])) * \
gamma[j] * m[0][i]
dtheta_1[0][i] = -1 * (y_out - y) * (y_out * (1 - y_out)) * gamma[j] * \
(n[0][j] * (1 - n[0][j])) * v[i][j] * (m[0][i] * (1 - m[0][i])) + dtheta_1[0][i]
for i in range(0, 2):
for j in range(0, 4):
for k in range(0, 4):
dw[i][j] = dw[i][j] + (y_out - y) * (y_out * (1 - y_out)) * gamma[k] * \
(n[0][i] * (1 - n[0][i])) * v[j][k] * (m[0][i] * (1 - m[0][i])) * input_x[0][i]
for i in range(0, 4):
for j in range(0, 2):
w[j][i] = w[j][i] - study_step * dw[j][i]
for i in range(0, 4):
theta_2[0][i] = theta_2[0][i] + study_step * dtheta_2[0][i]
theta_1[0][i] = theta_1[0][i] + study_step * dtheta_1[0][i]
gamma[i] = gamma[i] - study_step * dgamma[i]
for i in range(0, 4):
for j in range(0, 4):
v[i][j] = v[i][j] - study_step * dv[i][j]
theta_3 = theta_3 + study_step * dtheta_3
程序结果以及参数调优
先放上我的初始化参数和模型结果,我是针对一张图片进行多次循环求参数:
total_n = 60000
train_aside_n = 1
study_step = 0.8
epoch = 200
start_rand_max = 0.4
w = np.zeros(((2,4)), dtype=float)
gamma = np.zeros((4,1), dtype=float)
v = np.zeros((4,4), dtype=float)
theta_1 = np.zeros(((1,4)), dtype=float)
theta_2 = np.zeros(((1,4)), dtype=float)
theta_3 = random.uniform(0,start_rand_max)
m = np.ones(((1,4)), dtype=float)
n = np.ones(((1,4)), dtype=float)
input_x = np.zeros(((1, 2)), dtype=float)
y_out = float(0)
for i in range(0, 4):
gamma[i] = random.uniform(0,start_rand_max)
theta_1[0][i] = random.uniform(0,start_rand_max)
theta_2[0][i] = random.uniform(0,start_rand_max)
for j in range(0, 4):
v[i][j] = random.uniform(0,start_rand_max)
for k in range(0, 2):
w[k][i] = random.uniform(0, start_rand_max)
模型结果:
模型下降的结果还不是很满意,明天继续调参,探索参数对于神经网络的影响。我还发现调整了初始化的随机数值对于模型结果的影响很大。
|