MAE的编码器部分

n Our encoder is a ViT but applied only on visible, unmasked patches .

n Just as in a standard ViT , our encoder embeds patches by a linear projection with added positional embeddings, and then processes the resulting set via a series of Transformer blocks.

n However, our encoder only operates on a small subset (e.g., 25%) of the full set. Masked patches are removed; no mask tokens are used.

n This allows us to train very large encoders with only a fraction of compute and memory.

n The full set is handled by a lightweight decoder , described next.

MAE的解码器部分

n The input to the MAE decoder is the full set of tokens consisting of ( i ) encoded visible patches , and (ii) mask tokens .

n Each mask token is a shared, learned vector that indicates the presence of a missing patch to be predicted.

n We add positional embeddings to all tokens in this full set; without this, mask tokens would have no information about their location in the image.

n The decoder has another series of Transformer blocks.

未完，下一篇继续……?

创作打卡挑战赛

赢取流量/现金/CSDN周边激励大奖

人工智能最新文章

2022吴恩达机器学习课程——第二课（神经网

第十五章规则学习

FixMatch: Simplifying Semi-Supervised Le

数据挖掘Java——Kmeans算法的实现

大脑皮层的分割方法

【翻译】GPT-3是如何工作的

论文笔记:TEACHTEXT: CrossModal Generaliz

python从零学（六）

详解Python 3.x 导入(import)

【答读者问27】backtrader不支持最新版本的

加:2022-05-16 11:19:45 更:2022-05-16 11:20:26

360图书馆购物三丰科技阅读网日历万年历 2025年10日历

-2025/10/4 5:46:29-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码