1.正向传播
1.1输入层
每一张图应该是是m*n的二维数组的形式呈现在我们眼前,但现在我们把它打平:二维展开变成一维,本来该是原图中一列中的下一列,我们把这下一列不放在下面了,直接连着上一列,这样每张图片的数据就变成一维了。 把无数张打平的图片数据拿来训练,进行t次训练,每次训练选择batch张图(batch列)拿来训练,每张图进行训练epoch次。
1.2卷积层
下图的步长为2 代码表示卷积如下:
private void setConvolutionOutput(final CnnLayer paraLayer, final CnnLayer paraLastLayer) {
final int lastMapNum = paraLastLayer.getOutMapNum();
for (int j = 0; j < paraLayer.getOutMapNum(); j++) {
double[][] tempSumMatrix = null;
for (int i = 0; i < lastMapNum; i++) {
double[][] lastMap = paraLastLayer.getMap(i);
double[][] kernel = paraLayer.getKernel(i, j);
if (tempSumMatrix == null) {
tempSumMatrix = MathUtils.convnValid(lastMap, kernel);
} else {
tempSumMatrix = MathUtils.matrixOp(MathUtils.convnValid(lastMap, kernel),
tempSumMatrix, null, null, MathUtils.plus);
}
}
final double bias = paraLayer.getBias(j);
tempSumMatrix = MathUtils.matrixOp(tempSumMatrix, new Operator() {
private static final long serialVersionUID = 2469461972825890810L;
@Override
public double process(double value) {
return MathUtils.sigmod(value + bias);
}
});
paraLayer.setMapValue(j, tempSumMatrix);
}
}
MathUtils.convnValid:
public static double[][] convnValid(final double[][] matrix, double[][] kernel) {
int m = matrix.length;
int n = matrix[0].length;
final int km = kernel.length;
final int kn = kernel[0].length;
int kns = n - kn + 1;
final int kms = m - km + 1;
final double[][] outMatrix = new double[kms][kns];
for (int i = 0; i < kms; i++) {
for (int j = 0; j < kns; j++) {
double sum = 0.0;
for (int ki = 0; ki < km; ki++) {
for (int kj = 0; kj < kn; kj++)
sum += matrix[i + ki][j + kj] * kernel[ki][kj];
}
outMatrix[i][j] = sum;
}
}
return outMatrix;
}
注:进行了对上一层卷积完的结果要进行激活函数处理,这里的激活函数选的是sigmoid,
1
/
(
1
+
e
?
z
)
1/(1+e^{-z})
1/(1+e?z)
1.3 池化层
1.3.1平均池化
代码上用的是平均池化
private void setSampOutput(final CnnLayer paraLayer, final CnnLayer paraLastLayer) {
for (int i = 0; i < paraLayer.outMapNum; i++) {
int templastMapNum=paraLastLayer.outMapNum;
double[][] lastMap = paraLastLayer.getMap(i);
Size scaleSize = paraLayer.getScaleSize();
double[][] sampMatrix = MathUtils.scaleMatrix(lastMap, scaleSize);
paraLayer.setMapValue(i, sampMatrix);
}
}
MathUtils.scaleMatrix:
public static double[][] scaleMatrix(final double[][] matrix, final Size scale) {
int m = matrix.length;
int n = matrix[0].length;
final int sm = m / scale.width;
final int sn = n / scale.height;
final double[][] outMatrix = new double[sm][sn];
if (sm * scale.width != m || sn * scale.height != n)
throw new RuntimeException("scale matrix");
final int size = scale.width * scale.height;
for (int i = 0; i < sm; i++) {
for (int j = 0; j < sn; j++) {
double sum = 0.0;
for (int si = i * scale.width; si < (i + 1) * scale.width; si++) {
for (int sj = j * scale.height; sj < (j + 1) * scale.height; sj++) {
sum += matrix[si][sj];
}
}
outMatrix[i][j] = sum / size;
}
}
return outMatrix;
}
1.3.2最大池化
1.4输出层
输出层同样用卷积,用和上一层尺寸同样大小的kernel对上层的二维数组进行卷积运算,上一层第i个二维数组和输出层的第j个二维数组之间有自己的kernel。 代码中输出层为10个一维数组,假如当一维数组O_i位置装的是1(其他的位置装的是0),则卷积完成后,从0~9中分出的类是O_i(0<=O_i<=9)。
2.反向传播
2.1池化层的反向传播
2.1.1平均池化反向传播
代码上用的是平均池化
2.1.2最大池化的反向传播
2.2卷积层的反向传播
3.参考资料
|