IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> Keras-训练网络时的问题:loss一直为nan,accuracy一直为一个固定的数 -> 正文阅读

[人工智能]Keras-训练网络时的问题:loss一直为nan,accuracy一直为一个固定的数

问题描述

在使用VGG19做分类任务时,遇到一个问题:loss一直为nan,accuracy一直为一个固定的数,如下输出所示,即使加入了自动调整学习率 (ReduceLROnPlateau) 也没法解决问题。

# Change learning_rate auto
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', patience=10, mode='auto')

earlystopping = EarlyStopping(monitor='val_accuracy', verbose=1, patience=30)

# Vgg19
# Train the model with the new callback
history = model_vgg19.fit(train_gen, 
                    validation_data=valid_gen, 
                    epochs=200,
                    steps_per_epoch=len(train_gen),
                    validation_steps=len(valid_gen),
                    callbacks=[reduce_lr, earlystopping]) # callbacks=[cp_callback] Pass callback to training

输出:

Epoch 1/200
176/176 [==============================] - 31s 177ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 2/200
176/176 [==============================] - 31s 176ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 3/200
176/176 [==============================] - 31s 175ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 4/200
176/176 [==============================] - 31s 176ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 5/200
176/176 [==============================] - 31s 175ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 6/200
176/176 [==============================] - 31s 175ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 7/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 8/200
176/176 [==============================] - 31s 175ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 9/200
176/176 [==============================] - 31s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 10/200
176/176 [==============================] - 31s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 11/200
176/176 [==============================] - 30s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 12/200
176/176 [==============================] - 31s 175ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 13/200
176/176 [==============================] - 31s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 14/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 15/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 16/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 17/200
176/176 [==============================] - 30s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 18/200
176/176 [==============================] - 31s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 19/200
176/176 [==============================] - 31s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 20/200
176/176 [==============================] - 30s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 21/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 22/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 23/200
176/176 [==============================] - 31s 175ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 24/200
176/176 [==============================] - 31s 177ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 25/200
176/176 [==============================] - 31s 178ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 26/200
176/176 [==============================] - 31s 177ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 27/200
176/176 [==============================] - 31s 177ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 28/200
176/176 [==============================] - 31s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 29/200
176/176 [==============================] - 31s 173ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 30/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 31/200
176/176 [==============================] - 31s 174ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-06
Epoch 00031: early stopping

问题的解决:调整学习率后解决问题。

最常见的原因是学习率太高。对于分类问题,学习率太高会导致模型「顽固」地认为某些数据属于错误的类,而正确的类的概率为0(实际是浮点数下溢),这样用交叉熵就会算出无穷大的损失函数。一旦出现这种情况,无穷大对参数求导就会变成 NaN,之后整个网络的参数就都变成 NaN 了。解决方法是调小学习率,甚至把学习率调成0,看看问题是否仍然存在。若问题消失,那说明确实是学习率的问题。若问题仍存在,那说明刚刚初始化的网络就已经挂掉了,很可能是实现有错误。

作者:王赟 Maigo
链接:https://www.zhihu.com/question/62441748/answer/232522878 来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

在训练的时候,整个网络随机初始化,很容易出现nan,这时候需要把学习率调小,可以尝试0.1,0.01,0.001,直到不出现nan为止,如果一直都有,那可能是网络实现问题。
作者:峻许
链接:https://www.zhihu.com/question/62441748/answer/232704244
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

其他可能导致此问题的原因

在搜索原因的过程中看到了其他可能导致此问题的原因,在此记录一下:

1

可能的原因:

loss Nan有若干种问题:

  • 学习率太高。
  • 对于分类问题,用categorical cross entropy
  • 对于回归问题,可能出现了除0 的计算,加一个很小的余项可能可以解决
  • 数据本身是否存在Nan,可以用numpy.any(numpy.isnan(x))检查一下input和target
  • target本身应该是能够被loss函数计算的,比如sigmoid激活函数的target应该大于0,同样的需要检查数据集

作者:猪了个去
链接:https://www.zhihu.com/question/62441748/answer/232520044
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

2

可能的原因:

nan是代表无穷大或者非数值,一般在一个数除以0时或者log(0)时会遇到无穷大,所以你就要想想是否你在计算损失函数 (loss) 的时候,你的网络输出为0,因为计算了log(0)从而导致出现nan。
————————————————
版权声明:本文为CSDN博主「accumulate_zhang」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/accumulate_zhang/article/details/79890624

解决方法:查看链接1链接2链接3

3

可能的原因:

来源:https://discuss.gluon.ai/t/topic/8925/4
另外还有一种可能就是random的augmentation在你的图像上没处理好,可能会有一些极端的augmentation,验证方法是关掉train default aug里面的一些random crop 和random expand

但是我尝试去掉在数据预处理时的图像增强 (Data augmentation, ImageDataGenerator) 后,loss还是在中途变成nan,输出如下:

Epoch 1/200
176/176 [==============================] - 27s 84ms/step - loss: 10.1720 - accuracy: 0.0182 - val_loss: 7.0242 - val_accuracy: 0.0252 - lr: 0.0010
Epoch 2/200
176/176 [==============================] - 13s 74ms/step - loss: 10.0744 - accuracy: 0.0221 - val_loss: 5.1386 - val_accuracy: 0.0168 - lr: 0.0010
Epoch 3/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0149 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 4/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 5/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 6/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 7/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 8/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 9/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 10/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 11/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 12/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 0.0010
Epoch 13/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 14/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 15/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 16/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 17/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 18/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 19/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 20/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 21/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 22/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-04
Epoch 23/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 24/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 25/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 26/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 27/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 28/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 29/200
176/176 [==============================] - 13s 74ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 30/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 31/200
176/176 [==============================] - 13s 75ms/step - loss: nan - accuracy: 0.0100 - val_loss: nan - val_accuracy: 0.0100 - lr: 1.0000e-05
Epoch 00031: early stopping

问题又回归到最初的调整学习率问题。

4

可能的原因:

梯度爆炸,解决方法:降低学习率、梯度剪裁、归一化
————————————————
版权声明:本文为CSDN博主「鹅似一颗筱筱滴石头~」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/Drifter_Galaxy/article/details/104004267

关于如何选择学习率:

2.学习率和网络的层数一般成反比,层数越多,学习率通常要减小
3.有时候可以先用较小的学习率训练5000或以上次迭代,得到参数输出,手动kill掉训练,用前面的参数fine tune,这时候可以加大学习率,能更快收敛哦
4.如果用caffe训练的话,你会看到没输出,但是gpu的利用率一直为0或很小,兄弟,已经nan了,只是因为你display那里设置大了,导致无法显示,设为1就可以看到了
作者:峻许
链接:https://www.zhihu.com/question/62441748/answer/232704244
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2021-12-26 22:09:34  更:2021-12-26 22:12:57 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年5日历 -2024/5/19 2:50:42-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码