| |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
-> 人工智能 -> 【Hide-and-Seek】《Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization xxx》 -> 正文阅读 |
|
[人工智能]【Hide-and-Seek】《Hide-and-Seek: A Data Augmentation Technique for Weakly-Supervised Localization xxx》 |
ICCV-2017 文章目录
1 Background and Motivation弱监督定位任务中,过度关注 most discriminative 特征会导致 localization 不准(tend to focus only on the most discriminative parts, and thus fail to cover the entire spatial extent of an object) 作者提出 Hide-and-Seek 数据增广方法来缓解上述问题
2 Related Work
3 Advantages / Contributions提出 Hide-and-Seek 数据增广方法,该方法在不同网络 / 不同任务上均有不错的表现 4 Method1)Hide-and-Seek for images randomly hiding image patches,随机挡住 grid( p h i d e p_{hide} phide? probability) masked during training,测试不挡 训练挡测试不挡会导致训练和测试的分布不一致(the first convolutional layer activations during training versus testing will have different distributions) the distribution of w T x w^Tx wTx should be roughly the same during training and testing 哈哈,训练完测试时
w
w
w 一样,但是
x
x
x 不一样,分布不一致,怎么破!作者采用 mean pixels of train dataset 来 mask hidden patches,公式分析如下 如图 3 所示 当卷积作用在 inside visible patch 的时候,输出为
∑
i
=
1
k
×
k
w
i
T
x
i
\sum_{i=1}^{k \times k}w_i^Tx_i
∑i=1k×k?wiT?xi?
v
v
v as the vector representing the RGB value of every hidden pixel,作者设置为了 the mean RGB vector of the images over the entire dataset N p i x e l s N_{pixels} Npixels? is the total number of pixels in the dataset Why would this work? This is because in expectation, the output of a patch will be equal to that of an average-valued patch E [ ∑ i = 1 k × k w i T x i ] = ∑ i = 1 k × k w i T μ \mathbb{E}[\sum_{i=1}^{k \times k}w_i^Tx_i] = \sum_{i=1}^{k \times k}w_i^T\mu E[i=1∑k×k?wiT?xi?]=i=1∑k×k?wiT?μ 当 v v v 设置成 μ \mu μ 时 ∑ m ∈ v i s i b l e w m T x m + ∑ n ∈ h i d d e n w n T v = ∑ m ∈ v i s i b l e w m T x m + ∑ n ∈ h i d d e n w n T μ ≈ ∑ m ∈ v i s i b l e w m T μ + ∑ n ∈ h i d d e n w n T μ = ∑ i = 1 k × k w i T μ \sum_{m \in {visible}}w_m^Tx_m + \sum_{n \in {hidden}}w_n^Tv = \sum_{m \in {visible}}w_m^Tx_m + \sum_{n \in {hidden}}w_n^T\mu \approx \sum_{m \in {visible}}w_m^T\mu + \sum_{n \in {hidden}}w_n^T\mu = \sum_{i=1}^{k \times k}w_i^T\mu m∈visible∑?wmT?xm?+n∈hidden∑?wnT?v=m∈visible∑?wmT?xm?+n∈hidden∑?wnT?μ≈m∈visible∑?wmT?μ+n∈hidden∑?wnT?μ=i=1∑k×k?wiT?μ Hide and seek 是否会引入噪音? One may wonder whether Hide-and-Seek introduces any artifacts in the learned convolutional filters due to the sharp transition between a hidden patch and a visible patch
Also, the artificially created transitions will not be informative for the task at hand 2)Object localization network architecture 可以用任何结构,作者方便热力图分析采用的时 CAM 方法的基本结构 3)Hide-and-Seek for videos hiding patches in images hide frames in videos(random frame sequences are hidden),learn the relevant frames corresponding to an action 5 Experiments5.1 Datasets and MetricsILSVRC 2016 for Weakly-supervised object localization
PASCAL VOC 2012 for Weakly-supervised semantic segmentation
THUMOS 2014 validation data for Weakly-supervised temporal action localization
CIFAR-10 / CIFAR-100/ large-scale ILSVRC for Image Classification
PASCAL 2011 for full-supervised semantic segmentation
Cohn-Kanade database (CK+) for Emotion recognition
APPA-REAL / IMDB-WIKI for age estimation
UTKFace / IMDB-WIKI for gender estimation
DukeMTMC-reID and Market-1501 for ReID
ps:Market-1501 was collected during summer whereas DukeMTMCreID was created during winter 5.2 Weakly-supervised object localization1)Quantitative object localization results 2)Comparison to alternate localization methods 和其他的 weakly-supervised object location 比比看 3)Qualitative object localization results 绿框是预测的结果,红框是 GT,这个图真的是太给力了,直抒胸臆! 同样的,作者也展示了失败案例,填坑的 C 位都标记出来了,哈哈哈,等你下一篇来 battle 第三行 HaS 会把篱笆挡住了,学成了房子,至少联动的能力给你整出来了,哈哈哈 最后一行,倒影 4)Further Analysis of Hide-and-Seek <1> Do we need global average pooling?
但配合 HaS 逆袭了,likely due to max pooling being more robust to noise <2> Hide-and-Seek in convolutional layers <3> Probability of hiding 5.3 Weakly-supervised semantic image segmentationon the PASCAL 2012 val dataset improves from 60.80 to 61.45 (mean IU). mean IU 也就是 mIoU,比较草率,没有表格,就一句话 5.4 Weakly-supervised temporal action localization
看看预测的效果
第二个 javelin throw 5.5 Image classification
complementary to existing data augmentation techniques 这就是 HaS 的魅力吗? 下面看看例子 5.6 Semantic image segmentation5.7 Face based emotion recognition and age/gender estimation这里作者强调相对其他数据增广的方法(random flipping and random cropping), Hide-and-Seek brings an additional advantage to these tasks——没有 break the spatial alignment of pixels across the augmented samples,相当于没有改变人脸关键点的坐标位置 表情识别 5.8 Person re-identificationmore robust to occlusion 这里观察到,在 DukeMTMC-reID 数据集上 Has 的效果没有 ES 好,作者的解释是 Market-1501 was collected during summer whereas DukeMTMC-reID was created during winter DukeMTMC-reID 数据集本身遮挡的比较多,削弱了数据增广的能力 6 Conclusion(own) / Future work1)code:https://github.com/kkanshul/Hide-and-Seek 2)the patch sizes and hiding probabilities are hyperparameters,未来可以 learn during training 3)动作检测方法比较 Temporal Action Localization Comparison
can predict the label of an action as well as its start and end time for a test video. 4)与 random erasing 对比: a more generalized form of Random Erasing,hidden patches can also form a single continuous rectangle patch(albeit with very low probability). provide more variations in the types of occlusions 5)《Object region mining with adversarial erasing: A simple classification to semantic segmentation approach》(CVPR-2017)
6)attribute localization 7)CAM 《Learning deep features for discriminative localization》(CVPR-2016) 8)《Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation》(arXiv-2017) DCSP combines saliency and CAM to obtain a pseudo ground-truth label map to train the network for semantic segmentation Saliency Maps的原理与简单实现(使用Pytorch实现) 9)摘抄些其他的论文解读 ICCV 2017: Hide-and-Seek 一种数据扩增Trick |
|
|
上一篇文章 下一篇文章 查看所有文章 |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 | -2024/11/26 2:47:49- |
|
网站联系: qq:121756557 email:121756557@qq.com IT数码 |