最后一层的激活函数要用Sigmoid,因为希望预测的相似度得分在0~1之间 d1,relu, d2,relu, d3,relu d4, sigmoid 前面三层,Relu激活: 第三个线性层的输出负数太多,1:2,导致无效神经元太多 第四个线性层的输出值全是负数且非常相近,导致过完Sigmoid的值全部相同 把前面三层改Tanh激活:d4(10,1)全是负数,也很相近,过完Sigmoid的值有一点不同,跑完batch100,值全部相同 去掉一层:d3(20.1)很相近,过完Sigmoid的值有一点不同,跑完batch100,值全部相同 加dropout,加LayerNorm:batch100不会完全相同,随着训练到batch1000时,变成相同 为什么最后一层输出来的d4是负的,怎样让它变成正的? Tanh d4:[[-0.2611],[-0.2636],[-0.2638],[-0.2642],[-0.2643],[-0.2631],[-0.2630],[-0.2643],[-0.2645],[-0.2639],[-0.2641],[-0.2640],[-0.2651],[-0.2641],[-0.2644],[-0.2631]] scorer([[0.4351],[0.4345],[0.4344],[0.4343],[0.4343],[0.4346],[0.4346],[0.4343],[0.4343],[0.4344],[0.4344],[0.4344],[0.4341],[0.4344],[0.4343],[0.4346]] 100 ([[0.2710, 0.2710, 0.2710, 0.2710, 0.2710, 0.2710, 0.2710, 0.2710, 0.2710,0.2710]] 200 ([[0.2698, 0.2698, 0.2698, 0.2698, 0.2698, 0.2698, 0.2698, 0.2698, 0.2698,0.2698]] 300 ([[0.2688, 0.2688, 0.2688, 0.2688, 0.2688, 0.2688, 0.2688, 0.2688, 0.2688,0.2688]] 400 ([[0.2676, 0.2676, 0.2676, 0.2676, 0.2676, 0.2676, 0.2676, 0.2676, 0.2676,0.2676] Tanh,dropout=0.5 d4:[[-0.0856],[-0.3306],[-0.3250],[-0.3348],[-0.3605],[-0.3433],[-0.1659],[-0.2772],[-0.1677],[-0.2817],[-0.1912],[-0.3112],[-0.2635],[-0.3253],[-0.3857],[-0.5408]] scorer([[0.4786],[0.4181],[0.4195],[0.4171],[0.4108],[0.4150],[0.4586],[0.4311],[0.4582],[0.4300],[0.4523],[0.4228],[0.4345],[0.4194],[0.4047],[0.3680]] 分数逐渐下降 100 ([[0.4045, 0.4085, 0.4012, 0.4024, 0.4010, 0.3995, 0.4052, 0.4018, 0.4075,0.3990]] Tanh,dropout=0.2 d4:[-0.2165],[-0.3269],[-0.2281],[-0.3045],[-0.3261],[-0.2753],[-0.2394],[-0.2114],[-0.2075],[-0.2799],[-0.2755],[-0.2504],[-0.2852],[-0.2475],[-0.2447],[-0.3423] scorer([[0.4461],[0.4190],[0.4432],[0.4245],[0.4192],[0.4316],[0.4404],[0.4474],[0.4483],[0.4305],[0.4315],[0.4377],[0.4292],[0.4384],[0.4391],[0.4152]] 分数逐渐下降 100 ([[0.4045, 0.4085, 0.4012, 0.4024, 0.4010, 0.3995, 0.4052, 0.4018, 0.4075,0.3990]] 200 ([[0.3888, 0.3889, 0.3910, 0.3886, 0.3885, 0.3889, 0.3884, 0.3888, 0.3889,0.3881]] 300 ([[0.3807, 0.3797, 0.3797, 0.3799, 0.3798, 0.3798, 0.3799, 0.3796, 0.3797,0.3797]] 400 ([[0.3365, 0.3365, 0.3364, 0.3365, 0.3365, 0.3384, 0.3365, 0.3451, 0.3432,0.3366]] 500 ([[0.2309, 0.2650, 0.2665, 0.2648, 0.2544, 0.2404, 0.2426, 0.2422, 0.2428,0.2406]] 600 ([[0.1789, 0.1790, 0.1806, 0.1808, 0.1787, 0.1791, 0.1784, 0.1804, 0.1784,0.1789]] 700 ([[0.1456, 0.1601, 0.1615, 0.1605, 0.1616, 0.1597, 0.1518, 0.1601, 0.1609,0.1515]] 800 ([[0.1308, 0.1270, 0.1271, 0.1270, 0.1218, 0.1227, 0.1278, 0.1270, 0.1218,0.1270]] 900 ([[0.1092, 0.1146, 0.1090, 0.1092, 0.1146, 0.1092, 0.1092, 0.1146, 0.1092,0.1092]] 1000 ([[0.1024, 0.1025, 0.1025, 0.1024, 0.1031, 0.1025, 0.1025, 0.1031, 0.1025,0.1025] Tanh,dropout=0.2, norm(d3) d4:[[-0.8204],[-1.2999],[-0.8628],[-1.2150],[-1.2989],[-1.0138],[-0.9975],[-0.8131],[-0.8571],[-1.1200],[-1.0964],[-1.0004],[-1.1360],[-1.0196],[-0.9770],[-1.3880]] scorer([[0.3057],[0.2142],[0.2967],[0.2288],[0.2144],[0.2662],[0.2694],[0.3072],[0.2979],[0.2460],[0.2504],[0.2689],[0.2431],[0.2651],[0.2735],[0.1997]] 分数逐渐下降,训练到1000时全都一样 100 ([[0.4238, 0.4116, 0.4157, 0.4116, 0.4099, 0.4055, 0.4046, 0.4102, 0.4120,0.4067]] 200 ([[0.4078, 0.4121, 0.4110, 0.4088, 0.4077, 0.4128, 0.4080, 0.4078, 0.4149,0.4126]] 300 ([[0.4157, 0.4151, 0.4158, 0.4104, 0.4095, 0.4161, 0.4103, 0.4161, 0.4151,0.4150]] 400 ([[0.3977, 0.3976, 0.3974, 0.3975, 0.3975, 0.3975, 0.3975, 0.3976, 0.3976,0.3976]] 500 ([[0.3899, 0.4089, 0.4087, 0.4074, 0.3968, 0.4004, 0.4019, 0.4002, 0.4002,0.4004]] 600 ([[0.3881, 0.3882, 0.3883, 0.3881, 0.3882, 0.3881, 0.3881, 0.3882, 0.3881,0.3881]] 700 ([[0.3878, 0.3922, 0.3923, 0.3923, 0.3922, 0.3920, 0.3880, 0.3922, 0.3920,0.3880]] 800 ([[0.3864, 0.3863, 0.3863, 0.3863, 0.3863, 0.3863, 0.3863, 0.3863, 0.3863,0.3863]], 900 ([[0.3862, 0.3863, 0.3862, 0.3862, 0.3863, 0.3862, 0.3862, 0.3863, 0.3862,0.3862]] 1000 ([[0.3861, 0.3861, 0.3861, 0.3861, 0.3861, 0.3861, 0.3861, 0.3861, 0.3861,0.3861]]
有效的参数初始化 全部初始化 ReLU d4:[[-0.0124],[-0.0057],[-0.0211],[-0.0152],[-0.0276],[-0.0111],[-0.0277],[-0.0158],[-0.0212],[-0.0121],[-0.0270],[-0.0094],[-0.0104],[-0.0207],[-0.0081],[-0.0259]] scorer:[[0.4969],[0.4986],[0.4947],[0.4962],[0.4931],[0.4972],[0.4931],[0.4960],[0.4947],[0.4970],[0.4932],[0.4976],[0.4974],[0.4948],[0.4980],[0.4935]] 1 ([[0.4981, 0.4950, 0.4988, 0.5000, 0.4992, 0.4953, 0.4994, 0.4991, 0.4967,0.4941]] 2 ([[0.4666, 0.4676, 0.4671, 0.4717, 0.4709, 0.4647, 0.4680, 0.4637, 0.4642,0.4652]] 3 ([[0.4779, 0.4774, 0.4772, 0.4780, 0.4782, 0.4792, 0.4784, 0.4789, 0.4782,0.4779]] 4 ([[0.5044, 0.4961, 0.5046, 0.5053, 0.5036, 0.4995, 0.5008, 0.5032, 0.5043,0.5004]] 5 ([[0.5145, 0.5147, 0.5154, 0.5143, 0.5146, 0.5144, 0.5141, 0.5141, 0.5146,0.5149]] 6 ([[0.5297, 0.5267, 0.5293, 0.5296, 0.5285, 0.5260, 0.5265, 0.5282, 0.5322,0.5182]] 7 ([[0.5537, 0.5500, 0.5512, 0.5547, 0.5523, 0.5566, 0.5561, 0.5556, 0.5531,0.5536]] 8 ([[0.5871, 0.5928, 0.5912, 0.5896, 0.5936, 0.5914, 0.5737, 0.5886, 0.5871,0.5886]] 9 ([[0.6161, 0.6249, 0.6188, 0.6222, 0.6184, 0.6240, 0.6199, 0.6275, 0.6217,0.6016]] 10 ([[0.6510, 0.6626, 0.6433, 0.6658, 0.6590, 0.6462, 0.6640, 0.6630, 0.6590,0.6731]] 20 ([[0.9745, 0.9787, 0.9750, 0.9352, 0.9811, 0.9774, 0.9744, 0.9768, 0.9514,0.9802]] 30 ([[0.8804, 0.7842, 0.8930, 0.8694, 0.8602, 0.8551, 0.8838, 0.8385, 0.8180,0.8729]] 40 ([[0.7515, 0.7457, 0.7504, 0.7510, 0.7545, 0.7585, 0.7147, 0.7552, 0.7267,0.7316]] 50 ([[0.7043, 0.7061, 0.6889, 0.6959, 0.6970, 0.6858, 0.7014, 0.6996, 0.6974,0.6947]] 100 [[0.8671, 0.9203, 0.8607, 0.9494, 0.8933, 0.9422, 0.8834, 0.8716, 0.9273,0.8880]] 有效的参数初始化 只初始化mlp d4: scorer:
|