IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> 【TIP 2020】Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-ID -> 正文阅读

[人工智能]【TIP 2020】Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-ID

在这里插入图片描述

引言

  1. 提出了local-global 协作学习方法,用来进行标签估计
  2. 在损失中引入了变量信息瓶颈作为正则项。使得特征提取器能过滤掉无关因素
  3. 采用了和eug一样的迭代模式。

内容概要

论文名称简称会议/期刊出版年份baselinebackbone数据集
Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-IdentificationVOLTAIEEE TIP2020Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Bian, Y. Yang, Progressive learning for person re-identification with one example, IEEE Transactions on Image Processing 28 (6) (2019) 2872–2881ResNet-50DukeMTMC-VideoReID、MARS

在线链接:https://ieeexplore-ieee-org-s.nudtproxy.yitlink.com/document/9211791
源码链接: https://github.com/LgQu/VOLTA

工作概述

  1. In this article, we focus on one-shot video Re-ID and present an iterative local-global collaboration learning approach to learn robust and discriminative person representations. Specifically, it jointly considers the global video information and local frame sequence information to better capture the diverse appearance of the person for feature learning and pseudo-label estimation.
  2. Moreover, as the cross-entropy loss may induce the model to focus on identity-irrelevant factors, we introduce the variational information bottleneck as a regularization term to train the model together. It can help filter undesirable information and characterize subtle differences among persons. Since accuracy cannot always be guaranteed for pseudo-labels, we adopt a dynamic selection strategy to select part of pseudo-labeled data with higher confidence to update the training set and re-train the learning model. During

成果概述

Extensive experiments on two public datasets, i.e., DukeMTMC-VideoReID and MARS, have verified the superiority of our model to several cutting-edge competitors.

方法详解

方法框架

在这里插入图片描述

Fig. 2. The pipeline of our proposed one-shot video Re-ID approach VOLTA, comprising the following four processes: 1) Model Initialization. All the one-shot labeled video tracklets are utilized to initialize the learning model such that it has basic discrimination power. 2) Local-Global Label Propagation. We respectively utilize global and local video features to estimate the similarities between labeled and unlabeled samples, and then integrate two similarity values to predict pseudo labels for unlabeled video tracklets. 3) Dynamic Selection. Pseudo-labeled samples with higher confidence are selected to add into the training set via our dynamic threshold selection strategy. And 4) Model Update. Updating our feature learning model with the new training set. The GLP denotes the global label propagation, while the LLP denotes the local one. The bounding boxes and triangles with different colors refer to different identities. Moreover, the two red dotted boxes denote the frame pair with the highest similarity among the sequences.

在这里插入图片描述

Fig. 3. The framework of our feature extractor module, including the following two parts: 1) Local-Global Representation. For each video tracklet, the frame- level features are first extracted by a CNN model. The video-level feature is obtained by the global average pooling (GAP) operation. By stacking the two-level features, we could acquire the local-global feature representation. And 2) VIB-based Classification. Each column of the local-global feature is input to the VIB encoder to generate a latent vector. Afterwards, we combine the information and identity loss to train the feature learning model.

具体实现

  • Local-Global 表达。取了一个T帧的平均池化,再拼接上T帧,构成T+1维的 特征表示M。
    在这里插入图片描述

局部特征
在这里插入图片描述

全局特征

在这里插入图片描述
局部-全局特征
在这里插入图片描述

  • 信息过滤 VIB(variational information bottleneck) 分类。 由身份损失Lid 和信息损失Lin两个部分组成,其中Lin是文章独特设计的,计算的是 特征Z的标注正态分布和多元高斯分布之间的KL散度。而Z是根据原始特征的均值和方差计算得出的。最后再利用Z来求交叉熵损失作为身份损失Lid。
    在这里插入图片描述

在这里插入图片描述

信息损失:
在这里插入图片描述
其中:
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
m i , j m_{i,j} mi,j? the j-th column of the M i M_i Mi?

身份损失:

在这里插入图片描述
其中:
在这里插入图片描述

  • 标签估计。在这里环节里面也用到了local-global的思想,利用的是全局特征的余弦相似和局部特征余弦相似的最大相似的加权值 来评估相似性。置信度与相似性等同。

全局相似:
在这里插入图片描述
局部相似:

在这里插入图片描述

最后的相似性:
在这里插入图片描述
标签估计:

在这里插入图片描述
标签置信度:

在这里插入图片描述

  • 标签采样。还是采用的和EUG一样的迭代渐进采样,但不同的是没有设置固定的采样数量,而是通过设定相似性阈值来控制,并且阈值在迭代过程中进行调整,总体原则是放宽限度,使得更多的伪标签样本被选择。

阈值设定:
在这里插入图片描述
1 则入选。
阈值动态更新:
在这里插入图片描述

实验结果

在这里插入图片描述
在这里插入图片描述

总体评价

  • local-global的特征表达这个点比较简单,这种思想在其他文章中也有用到。
  • 从ablation study可以看出,没有局部信息对文章的信息很小。看来主要贡献还是在于信息过滤VIB。
  • VIB看起来有点复杂了,不知道是不是借鉴了其他文章中的方法。
  • 通过设定相似性阈值才进行采样在其他文章中也有见到。
  • 这里采用的是余弦相似性,用欧氏距离也可以,并且在以前的文章里面,欧式距离的效果通常比余弦要好。
  • 总的来说,是篇好文章,不管是方式,还是写作,都有诸多值得学习的地方。
  • 其实觉得它的图画得不是那么漂亮哈

引用格式

@ARTICLE{9211791, author={Liu, Meng and Qu, Leigang and Nie, Liqiang and Liu, Maofu and Duan, Lingyu and Chen, Baoquan}, journal={IEEE Transactions on Image Processing}, title={Iterative Local-Global Collaboration Learning Towards One-Shot Video Person Re-Identification}, year={2020}, volume={29}, number={}, pages={9360-9372}, doi={10.1109/TIP.2020.3026625}}

参考文献

[1] D. Chung, K. Tahboub, and E. J. Delp, “A two stream Siamese convolutional neural network for person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 1983–1991.
[2] Y. Yan, B. Ni, Z. Song, C. Ma, Y. Yan, and X. Yang, “Person re- identification via recurrent feature aggregation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 701–716.
[3] L. Wu, Y. Wang, J. Gao, and X. Li, “Where-and-when to look: Deep Siamese attention networks for video-based person re- identification,” IEEE Trans. Multimedia, vol. 21, no. 6, pp. 1412–1424, Jun. 2019.
[4] S. Kan, Y. Cen, Z. He, Z. Zhang, L. Zhang, and Y. Wang, “Supervised deep feature embedding with handcrafted feature,” IEEE Trans. Image Process., vol. 28, no. 12, pp. 5809–5823, Dec. 2019.
[5] J. Li, S. Zhang, J. Wang, W. Gao, and Q. Tian, “Global-local temporal representations for video person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2019, pp. 3958–3967.
[6] J. Li, S. Zhang, and T. Huang, “Multi-scale temporal cues learning for video person re-identification,” IEEE Trans. Image Process., vol. 29, pp. 4461–4473, 2020.
[7] Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Bian, and Y. Yang, “Progressive learning for person re-identification with one example,” IEEE Trans. Image Process., vol. 28, no. 6, pp. 2872–2881, Jun. 2019.
[8] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sep. 2010.
[9] A. Dehghan, S. M. Assari, and M. Shah, “GMMCP tracker: Globally optimal generalized maximum multi clique problem for multiple object tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 4091–4099.
[10] L. Qi, L. Wang, J. Huo, L. Zhou, Y. Shi, and Y. Gao, “A novel unsupervised camera-aware domain adaptation framework for person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2019, pp. 8080–8089.
[11] H.-X. Yu, W.-S. Zheng, A. Wu, X. Guo, S. Gong, and J.-H. Lai, “Unsupervised person re-identification by soft multilabel learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2019, pp. 2148–2157.
[12] N. McLaughlin, J. Martinez del Rincon, and P. Miller, “Recurrent convolutional network for video-based person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 1325–1334.
[13] S. Xu, Y. Cheng, K. Gu, Y. Yang, S. Chang, and P. Zhou, “Jointly attentive spatial-temporal pooling networks for video-based person re- identification,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 4733–4742.
[14] M. Ye, A. J. Ma, L. Zheng, J. Li, and P. C. Yuen, “Dynamic label graph matching for unsupervised video re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 5142–5150.
[15] Z. Liu, D. Wang, and H. Lu, “Stepwise metric promotion for unsuper- vised video person re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2429–2438.
[16] Y. Wu, Y. Lin, X. Dong, Y. Yan, W. Ouyang, and Y. Yang, “Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 5177–5186.
[17] H.-X. Yu, A. Wu, and W.-S. Zheng, “Unsupervised person re- identification by deep asymmetric metric embedding,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 4, pp. 956–973, Apr. 2020.
[18] M. Li, X. Zhu, and S. Gong, “Unsupervised tracklet person re- identification,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 7, pp. 1770–1782, Jul. 2020.
[19] Y. Lin, X. Dong, L. Zheng, Y. Yan, and Y. Yang, “A bottom-up clustering approach to unsupervised person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 8738–8745.
[20] G. Ding, S. Khan, Z. Tang, J. Zhang, and F. Porikli, “Towards bet- ter validity: Dispersion based clustering for unsupervised person re- identification,” 2019, arXiv:1906.01308. [Online]. Available: http://arxiv. org/abs/1906.01308
[21] L. Wu, Y. Wang, H. Yin, M. Wang, and L. Shao, “Few-shot deep adversarial learning for video-based person re-identification,” IEEE Trans. Image Process., vol. 29, pp. 1233–1245, 2020.
[22] Y. Kim, S. Choi, T. Kim, S. Lee, and C. Kim, “Learning to align multi- camera domains using part-aware clustering for unsupervised video person re-identification,” 2019, arXiv:1909.13248. [Online]. Available: http://arxiv.org/abs/1909.13248
[23] M. Ye, J. Li, A. J. Ma, L. Zheng, and P. C. Yuen, “Dynamic graph co- matching for unsupervised video-based person re-identification,” IEEE Trans. Image Process., vol. 28, no. 6, pp. 2976–2990, Jun. 2019.
[24] K. Liu, B. Ma, W. Zhang, and R. Huang, “A spatio-temporal appearance representation for video-based pedestrian re-identification,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 3810–3818.
[25] H. Liu et al., “Video-based person re-identification with accumulative motion context,” IEEE Trans. Circuits Syst. Video Technol., vol. 28, no. 10, pp. 2788–2802, Oct. 2018.
[26] M. Liu, L. Nie, X. Wang, Q. Tian, and B. Chen, “Online data organizer: Micro-video categorization by structure-guided multimodal dictionary learning,” IEEE Trans. Image Process., vol. 28, no. 3, pp. 1235–1247, Mar. 2019.
[27] Y. Liu, J. Yan, and W. Ouyang, “Quality aware network for set to set recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 5790–5799.
[28] Z. Zhou, Y. Huang, W. Wang, L. Wang, and T. Tan, “See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, pp. 4747–4756.
[29] D. Chen, H. Li, T. Xiao, S. Yi, and X. Wang, “Video person re- identification with competitive snippet-similarity aggregation and co- attentive snippet embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1169–1178.
[30] S. Li, S. Bak, P. Carr, and X. Wang, “Diversity regularized spatiotem- poral attention for video-based person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 369–378.
[31] M. Liu, X. Wang, L. Nie, Q. Tian, B. Chen, and T.-S. Chua, “Cross- modal moment localization in videos,” in Proc. 26th ACM Int. Conf. Multimedia, 2018, pp. 843–851.
[32] J. Li, S. Zhang, and T. Huang, “Multi-scale 3d convolution network for video based person re-identification,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 8618–8625.
[33] J. Liu, Z.-J. Zha, X. Chen, Z. Wang, and Y. Zhang, “Dense 3D- convolutional neural network for person re-identification in videos,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 15, no. 1s, pp. 1–19, Feb. 2019.
[34] M. Ye, X. Lan, and P. C. Yuen, “Robust anchor embedding for unsupervised video person re-identification in the wild,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 170–186.
[35] Y. Zou, Z. Yu, X. Liu, B. V. K. V. Kumar, and J. Wang, “Confidence reg- ularized self-training,” in Proc. IEEE Int. Conf. Comput. Vis., Oct. 2019, pp. 5982–5991.
[36] A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in Proc. Int. Conf. Learn. Represent., 2017, pp. 1–19.
[37] J. Wu, S. Liao, Z. Lei, X. Wang, Y. Yang, and S. Z. Li, “Clustering and dynamic sampling based unsupervised domain adaptation for person re- identification,” in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2019, pp. 886–891.
[38] Y. Lin, L. Xie, Y. Wu, C. Yan, and Q. Tian, “Unsupervised person re-identification via softened similarity learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2020, pp. 3390–3399.
[39] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Proc. Eur. Conf. Comput. Vis., vol. 2016, pp. 17–35.
[40] L. Zheng et al., “Mars: A video benchmark for large-scale person re- identification,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 868–884.
[41] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scal- able person re-identification: A benchmark,” in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 1116–1124.
[42] S. Bak, E. Corvee, F. Bremond, and M. Thonnat, “Person re- identification using spatial covariance regions of human body parts,” in Proc. 7th IEEE Int. Conf. Adv. Video Signal Based Surveill., Aug. 2010, pp. 435–440.
[43] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
[44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 770–778.
[45] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, “Learning deep features for discriminative localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2921–2929

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2021-08-29 09:05:19  更:2021-08-29 09:06:51 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/1 12:42:01-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码