阅读总结
Google的TPU是AI_ASIC芯片的鼻祖,从论文的作者数量之庞大,及论文少有的出现了致谢,就可以看出一定是历经了一番磨砺才创造出来。该论文发表在 2017 年,让我们回到那个年代,一同看看是什么样的背景诞生了如此伟大的艺术品~
In-Datacenter Performance Analysis of a Tensor Processing Unit
To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 26, 2017.
0. Abstract
摘要主要介绍我们用了什么,和什么相比较,有什么样的结果.
1. Introduction to Neural Networks
A step called ?quantization transforms floating-point numbers into narrow integers—often just 8 bits—which are usually good enough for inference. Eight-bit integer multiplies can be 6X less energy and 6X less area than IEEE 754 16-bit floating-point multiplies, and the 1advantage for integer addition is 13X in energy and 38X in area [Dal16] 介绍INT8,为什么需要INT8,以及量化(quantization)的概念
data:image/s3,"s3://crabby-images/914fd/914fdcf6168f1e12cec66f4b55f2d4ea48bed3aa" alt="在这里插入图片描述" NN – Neural Networks ,当前主要有三类,多层感知器,CNN,RNN. data:image/s3,"s3://crabby-images/6a857/6a857858a417a87fb1230ff8481f5499ee79741b" alt="在这里插入图片描述"
2. TPU Origin, Architecture, and Implementation
data:image/s3,"s3://crabby-images/7abbd/7abbd230c6bea6afb59db9edd78f1969a48b3dda" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/f5809/f58090c2732e9a4ccae59a3413261382e5201f80" alt="在这里插入图片描述" 脉动矩阵 data:image/s3,"s3://crabby-images/610f7/610f7a4dec2f27b567874b0ef95ba8c5b644fc9f" alt="在这里插入图片描述" data:image/s3,"s3://crabby-images/d3b09/d3b09a53bf78c4627e78b0f4aa04424d0a30cb5a" alt="在这里插入图片描述"
4. Performance: Rooflines, Response-Time, and Throughput
待续
|