官方链接:https://tensorflow.google.cn/versions/r2.1/api_docs/python/tf/keras/layers/Attention
tf.keras.layers.Attention( ? ? use_scale=False, **kwargs )
Inputs are?query ?tensor of shape?[batch_size, Tq, dim] ,?value ?tensor of shape?[batch_size, Tv, dim] ?and?key ?tensor of shape?[batch_size, Tv, dim] . The calculation follows the steps:
- Calculate scores with shape?
[batch_size, Tq, Tv] ?as a?query -key ?dot product:?scores = tf.matmul(query, key, transpose_b=True) . - Use scores to calculate a distribution with shape?
[batch_size, Tq, Tv] :?distribution = tf.nn.softmax(scores) . - Use?
distribution ?to create a linear combination of?value ?with shape?batch_size, Tq, dim] :?return tf.matmul(distribution, value) .
例子1
import tensorflow as tf
import numpy as np
query = tf.convert_to_tensor(np.asarray([[[1., 1., 1., 3.]]]))
key_list = tf.convert_to_tensor(np.asarray([[[1., 1., 2., 4.], [4., 1., 1., 3.], [1., 1., 2., 1.]],
[[1., 0., 2., 1.], [1., 2., 1., 2.], [1., 0., 2., 1.]]]))
query_value_attention_seq = tf.keras.layers.Attention()([query, key_list])
print('query shape:', query.shape)
print('key shape:', key_list.shape)
print('result 1:',query_value_attention_seq)
结果:
query shape: (1, 1, 4)
key shape: (2, 3, 4)
result 1: tf.Tensor(
[[[1.8067516 1. 1.7310829 3.730812 ]]
[[0.99999994 1.9293262 1.0353367 1.9646629 ]]], shape=(2, 1, 4), dtype=float32)
根据文档中提到步骤自己实现
scores = tf.matmul(query, key_list, transpose_b=True)
distribution = tf.nn.softmax(scores)
result = tf.matmul(distribution, key_list)
print('result 2:',query_value_attention_seq)
结果如下:可以看到结果是和我们理解的一样的
result 2: tf.Tensor(
[[[1.8067516 1. 1.7310829 3.730812 ]]
[[0.99999994 1.9293262 1.0353367 1.9646629 ]]], shape=(2, 1, 4), dtype=float32)
|