3-1 加载IMDB数据库
IMDB: 0. 这是一个电影评论的数据集合
- 内置于kearas数据集,第一次下载约80mb
- 训练25k条评论,测试25k条
- 数据已经经过处理(单词序列已经被转换成整数序列)
- 在数据集中的某个整数将代表字典中的某个单词
任务: 使用算法将电影评论中的文字内容划分为正面或者负面(分类问题)
import os
os.environ['KERAS_BACKEND']='tensorflow'
from keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words = 10000)
train_labels[0]
max([max(sequence) for sequence in train_data])
9999
word_index = imdb.get_word_index()
reverse_word_index = dict(
[(value, key) for (key, value) in word_index.items()])
decoded_review = ' '.join(
[reverse_word_index.get(i - 3, '?') for i in train_data[0]])
print(reverse_word_index)
{34701: 'fawn', 52006: 'tsukino', 52007: 'nunnery', 16816: 'sonja', 63951: 'vani', 1408: 'woods', 16115: 'spiders', 2345: 'hanging', 2289: 'woody', 52008: 'trawling', 52009: "hold's", 11307: 'comically', 40830: 'localized', 30568: 'disobeying', 52010: "'royale", 40831: "harpo's", 52011: 'canet', 19313: 'aileen', 52012: 'acurately', 52013: "diplomat's", 25242: 'rickman', 6746: 'arranged', 52014: 'rumbustious', 52015: 'familiarness', 52016: "spider'", 68804: 'hahahah', 52017: "wood'", 40833: 'transvestism', 34702:
3-2 将整数序列编码为二进制矩阵
import numpy as np
def vectorize_sequence(sequences, dimension = 10000):
results = np.zeros((len(sequences), dimension))
for i , sequences in enumerate(sequences):
results[i, sequences] = 1.
return results
x_train = vectorize_sequence(train_data)
x_test = vectorize_sequence(test_data)
print(x_train[0])
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
[0. 1. 1. ... 0. 0. 0.]
3-3 定义分类模型
from keras import models
from keras import layers
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
3-4 编译模型
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
3-5 优化配置器
from keras import optimizers
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
3-6 使用自定义优化器
from keras import losses
from keras import metrics
model.compile(optimizer=optimizers.RMSprop(lr=0.001),
loss=losses.binary_crossentropy,
metrics=[metrics.binary_accuracy])
3-7 流出验证集合
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
3-8 训练模型
model.compile(optimizer = 'rmsprop',loss = 'binary_crossentropy', metrics = ['acc'])
history = model.fit(partial_x_train,
partial_y_train,
epochs = 20, batch_size = 512,
validation_data = (x_val, y_val))
Train on 15000 samples, validate on 10000 samples
Epoch 1/20
15000/15000 [==============================] - 2s 119us/step - loss: 0.5112 - acc: 0.7905 - val_loss: 0.4020 - val_acc: 0.8501
Epoch 2/20
15000/15000 [==============================] - 1s 99us/step - loss: 0.3080 - acc: 0.9007 - val_loss: 0.3063 - val_acc: 0.8897
Epoch 3/20
15000/15000 [==============================] - 2s 100us/step - loss: 0.2267 - acc: 0.9267 - val_loss: 0.2805 - val_acc: 0.8907
Epoch 4/20
15000/15000 [==============================] - 2s 145us/step - loss: 0.1802 - acc: 0.9425 - val_loss: 0.3003 - val_acc: 0.8790
Epoch 5/20
15000/15000 [==============================] - 1s 96us/step - loss: 0.1463 - acc: 0.9545 - val_loss: 0.2805 - val_acc: 0.8887
Epoch 6/20
15000/15000 [==============================] - 1s 97us/step - loss: 0.1225 - acc: 0.9599 - val_loss: 0.2890 - val_acc: 0.8881
Epoch 7/20
15000/15000 [==============================] - 1s 97us/step - loss: 0.0999 - acc: 0.9707 - val_loss: 0.3170 - val_acc: 0.8803
Epoch 8/20
15000/15000 [==============================] - 1s 98us/step - loss: 0.0823 - acc: 0.9759 - val_loss: 0.3298 - val_acc: 0.8788
Epoch 9/20
15000/15000 [==============================] - 2s 143us/step - loss: 0.0689 - acc: 0.9821 - val_loss: 0.3529 - val_acc: 0.8780
Epoch 10/20
15000/15000 [==============================] - 2s 131us/step - loss: 0.0571 - acc: 0.9855 - val_loss: 0.3691 - val_acc: 0.8785
Epoch 11/20
15000/15000 [==============================] - 2s 129us/step - loss: 0.0461 - acc: 0.9895 - val_loss: 0.3985 - val_acc: 0.8743
Epoch 12/20
15000/15000 [==============================] - 2s 115us/step - loss: 0.0381 - acc: 0.9910 - val_loss: 0.4228 - val_acc: 0.8752
Epoch 13/20
15000/15000 [==============================] - 2s 103us/step - loss: 0.0325 - acc: 0.9931 - val_loss: 0.4614 - val_acc: 0.8757
Epoch 14/20
15000/15000 [==============================] - 1s 95us/step - loss: 0.0229 - acc: 0.9961 - val_loss: 0.4802 - val_acc: 0.8723
Epoch 15/20
15000/15000 [==============================] - 1s 96us/step - loss: 0.0183 - acc: 0.9977 - val_loss: 0.5156 - val_acc: 0.8705
Epoch 16/20
15000/15000 [==============================] - 1s 95us/step - loss: 0.0169 - acc: 0.9969 - val_loss: 0.5463 - val_acc: 0.8687
Epoch 17/20
15000/15000 [==============================] - 1s 97us/step - loss: 0.0120 - acc: 0.9985 - val_loss: 0.5793 - val_acc: 0.8675
Epoch 18/20
15000/15000 [==============================] - 1s 95us/step - loss: 0.0070 - acc: 0.9997 - val_loss: 0.6159 - val_acc: 0.8679
Epoch 19/20
15000/15000 [==============================] - 1s 93us/step - loss: 0.0084 - acc: 0.9987 - val_loss: 0.6450 - val_acc: 0.8680
Epoch 20/20
15000/15000 [==============================] - 1s 98us/step - loss: 0.0072 - acc: 0.9987 - val_loss: 0.6745 - val_acc: 0.8670
调用一个model.fit()返回一个History对象。这个对象有一个成员history,其包含了训练中的所有数据
history_dict = history.history
history_dict.keys()
dict_keys(['val_loss', 'val_acc', 'loss', 'acc'])
3-9 绘制训练损失以及验证损失
import matplotlib.pyplot as plt
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, 'bo', label = 'Training loss')
plt.plot(epochs, val_loss_values, 'b', label = 'Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
3- 10 绘制训练损失以及验证损失
plt.clf()
acc = history_dict['acc']
val_acc = history_dict['val_acc']
plt.plot(epochs, acc, 'bo', label = 'Training acc')
plt.plot(epochs, val_acc, 'b', label = 'Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
代码清单 3-11 从开头重新训练一个网络模型
model2 = models.Sequential()
model2.add(layers.Dense(16, activation = 'relu', input_shape = (10000,)))
model2.add(layers.Dense(16, activation = 'relu'))
model2.add(layers.Dense(1, activation = 'sigmoid'))
model2.compile(optimizer = 'rmsprop',
loss = 'binary_crossentropy',
metrics = ['accuracy'])
model2.fit(x_train, y_train, epochs = 4, batch_size = 512)
results = model2.evaluate(x_test, y_test)
print(results)
Epoch 1/4
25000/25000 [==============================] - 3s 124us/step - loss: 0.4411 - accuracy: 0.8178
Epoch 2/4
25000/25000 [==============================] - 4s 147us/step - loss: 0.2548 - accuracy: 0.9086
Epoch 3/4
25000/25000 [==============================] - 4s 145us/step - loss: 0.1971 - accuracy: 0.9281
Epoch 4/4
25000/25000 [==============================] - 4s 142us/step - loss: 0.1669 - accuracy: 0.9397
25000/25000 [==============================] - 6s 250us/step
[0.30119905314922335, 0.8805199861526489]
写在最后
注:本文代码来自《Python 深度学习》,做成电子笔记的方式上传,仅供学习参考,作者均已运行成功,如有遗漏请练习本文作者
各位看官,都看到这里了,麻烦动动手指头给博主来个点赞8,您的支持作者最大的创作动力哟! <(^-^)> 才疏学浅,若有纰漏,恳请斧正 本文章仅用于各位同志作为学习交流之用,不作任何商业用途,若涉及版权问题请速与作者联系,望悉知
|