本专栏是计算机视觉方向论文收集积累,时间:2021年9月7日,来源:paper digest
欢迎关注原创公众号?【计算机视觉联盟】,回复?【西瓜书手推笔记】?可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE:?Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark AUTHORS: Zhengcong Fei ; Zekang Li ; Jinchao Zhang ; Yang Feng ; Jie Zhou CATEGORY: cs.CL [cs.CL, cs.CV] HIGHLIGHT: In this paper, we propose a new task named as \textbf{M}eme incorporated \textbf{O}pen-domain \textbf{D}ialogue (MOD). To facilitate the MOD research, we construct a large-scale open-domain multimodal dialogue dataset incorporating abundant Internet memes into utterances.
2, TITLE:?Data Efficient Masked Language Modeling for Vision and Language AUTHORS: Yonatan Bitton ; Gabriel Stanovsky ; Michael Elhadad ; Roy Schwartz CATEGORY: cs.CL [cs.CL, cs.CV, cs.LG] HIGHLIGHT: In this paper, we observe several key disadvantages of MLM in this setting.
3, TITLE:?Spatiotemporal Inconsistency Learning for DeepFake Video Detection AUTHORS: Zhihao Gu CATEGORY: cs.CV [cs.CV] HIGHLIGHT: Specifically, we present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions.
4, TITLE:?Utilizing Adversarial Targeted Attacks to Boost Adversarial Robustness AUTHORS: Uriya Pesso ; Koby Bibas ; Meir Feder CATEGORY: cs.CV [cs.CV] HIGHLIGHT: We propose a novel solution by adopting the recently suggested Predictive Normalized Maximum Likelihood.
5, TITLE:?On Robustness of Generative Representations Against Catastrophic Forgetting AUTHORS: Wojciech Masarczyk ; Kamil Deja ; Tomasz Trzci?ski CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: In this work, we aim at answering this question by posing and validating a set of research hypotheses related to the specificity of representations built internally by neural models.
6, TITLE:?ISyNet: Convolutional Neural Networks Design for AI Accelerator AUTHORS: ALEXEY LETUNOVSKIY et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: For many years the main goal of the research was to improve the quality of models, even if the complexity was impractically high.
7, TITLE:?Learning Object-Compositional Neural Radiance Field for Editable Scene Rendering AUTHORS: BANGBANG YANG et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we present a novel neural scene rendering system, which learns an object-compositional neural radiance field and produces realistic rendering with editing capability for a clustered and real-world scene.
8, TITLE:?Weakly Supervised Relative Spatial Reasoning for Visual Question Answering AUTHORS: Pratyay Banerjee ; Tejas Gokhale ; Yezhou Yang ; Chitta Baral CATEGORY: cs.CV [cs.CV, cs.CL, cs.LG] HIGHLIGHT: In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task.
9, TITLE:?Audio-Visual Transformer Based Crowd Counting AUTHORS: Usman Sajid ; Xiangyu Chen ; Hasan Sajid ; Taejoon Kim ; Guanghui Wang CATEGORY: cs.CV [cs.CV] HIGHLIGHT: The paper proposes a new audiovisual multi-task network to address the critical challenges in crowd counting by effectively utilizing both visual and audio inputs for better modalities association and productive feature extraction.
10, TITLE:?RiWNet: A Moving Object Instance Segmentation Network Being Robust in Adverse Weather Conditions AUTHORS: Chenjie Wang ; Chengyuan Li ; Bin Luo ; Wei Wang ; Jun Liu CATEGORY: cs.CV [cs.CV] HIGHLIGHT: We compare RiWNet to several other state-of-the-art methods in some challenging datasets, and RiWNet shows better performance especially under adverse weather conditions. Finally, in order to verify the effect of moving instance segmentation in different weather disturbances, we propose a VKTTI-moving dataset which is a moving instance segmentation dataset based on the VKTTI dataset, taking into account different weather scenes such as rain, fog, sunset, morning as well as overcast.
11, TITLE:?GOHOME: Graph-Oriented Heatmap Output Forfuture Motion Estimation AUTHORS: Thomas Gilles ; Stefano Sabatini ; Dzmitry Tsishkou ; Bogdan Stanciulescu ; Fabien Moutarde CATEGORY: cs.CV [cs.CV, cs.RO] HIGHLIGHT: In this paper, we propose GOHOME, a method leveraging graph representations of the High Definition Map and sparse projections to generate a heatmap output representing the future position probability distribution for a given agent in a traffic scene.
12, TITLE:?Sparse Spatial Attention Network for Semantic Segmentation AUTHORS: Mengyu Liu ; Hujun Yin CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we present a sparse spatial attention network (SSANet) to improve the efficiency of the spatial attention mechanism without sacrificing the performance.
13, TITLE:?Stimuli-Aware Visual Emotion Analysis AUTHORS: Jingyuan Yang ; Jie Li ; Xiumei Wang ; Yuxuan Ding ; Xinbo Gao CATEGORY: cs.CV [cs.CV, cs.AI] HIGHLIGHT: Inspired by the \textit{Stimuli-Organism-Response (S-O-R)} emotion model in psychological theory, we proposed a stimuli-aware VEA method consisting of three stages, namely stimuli selection (S), feature extraction (O) and emotion prediction (R).
14, TITLE:?Robust Fine-tuning of Zero-shot Models AUTHORS: MITCHELL WORTSMAN et. al. CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models.
15, TITLE:?Dual Transfer Learning for Event-based End-task Prediction Via Pluggable Event to Image Translation AUTHORS: Lin Wang ; Yujeong Chae ; Kuk-Jin Yoon CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we propose a simple yet flexible two-stream framework named Dual Transfer Learning (DTL) to effectively enhance the performance on the end-tasks without adding extra inference cost.
16, TITLE:?A Comprehensive Approach for UAV Small Object Detection with Simulation-based Transfer Learning and Adaptive Fusion AUTHORS: Chen Rui ; Guo Youwei ; Zheng Huafei ; Jiang Hongyu CATEGORY: cs.CV [cs.CV] HIGHLIGHT: To tackle these problems, a novel comprehensive approach that combines transfer learning based on simulation data and adaptive fusion is proposed.
17, TITLE:?Square Root Marginalization for Sliding-Window Bundle Adjustment AUTHORS: Nikolaus Demmel ; David Schubert ; Christiane Sommer ; Daniel Cremers ; Vladyslav Usenko CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper we propose a novel square root sliding-window bundle adjustment suitable for real-time odometry applications.
18, TITLE:?Robust Attentive Deep Neural Network for Exposing GAN-generated Faces AUTHORS: Hui Guo ; Shu Hu ; Xin Wang ; Ming-Ching Chang ; Siwei Lyu CATEGORY: cs.CV [cs.CV] HIGHLIGHT: To address these shortcomings, we propose a robust, attentive, end-to-end network that can spot GAN-generated faces by analyzing their eye inconsistencies.
19, TITLE:?Toward Realistic Single-View 3D Object Reconstructionwith Unsupervised Learning from Multiple Images AUTHORS: Long-Nhat Ho ; Anh Tuan Tran ; Quynh Phung ; Minh Hoai CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we eliminate the symmetry requirement with a novel unsupervised algorithm that can learn a 3D reconstruction network from a multi-image dataset.
20, TITLE:?Spatial Domain Feature Extraction Methods for Unconstrained Handwritten Malayalam Character Recognition AUTHORS: Jomy John CATEGORY: cs.CV [cs.CV] HIGHLIGHT: Spatial domain features suitable for recognition are chosen in this work.
21, TITLE:?Does Melania Trump Have A Body Double from The Perspective of Automatic Face Recognition? AUTHORS: Khawla Mallat ; Fabiola Becerra-Riera ; Annette Morales-Gonz�lez ; Heydi M�ndez-V�zquez ; Jean-Luc Dugelay CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: In this paper, we explore whether automatic face recognition can help in verifying widespread misinformation on social media, particularly conspiracy theories that are based on the existence of body doubles.
22, TITLE:?Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing AUTHORS: XINGJIAN HE et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we propose a Spatial-Temporal Semantic Consistency method to capture class-exclusive context information.
23, TITLE:?Stochastic Neural Radiance Fields:Quantifying Uncertainty in Implicit 3D Representations AUTHORS: Jianxiong Shen ; Adria Ruiz ; Antonio Agudo ; Francesc Moreno CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this context, we propose Stochastic Neural Radiance Fields (S-NeRF), a generalization of standard NeRF that learns a probability distribution over all the possible radiance fields modeling the scene.
24, TITLE:?Less Is More: Lighter and Faster Deep Neural Architecture for Tomato Leaf Disease Classification AUTHORS: Sabbir Ahmed ; Md. Bakhtiar Hasan ; Tasnim Ahmed ; Redwan Karim Sony ; Md. Hasanul Kabir CATEGORY: cs.CV [cs.CV, cs.LG, I.4.9] HIGHLIGHT: This work proposes a lightweight transfer learning-based approach for detecting diseases from tomato leaves.
25, TITLE:?Identification of Driver Phone Usage Violations Via State-of-the-Art Object Detection with Tracking AUTHORS: Steven Carrell ; Amir Atapour-Abarghouei CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG] HIGHLIGHT: In this work, we propose a custom-trained state-of-the-art object detector to work with roadside cameras to capture driver phone usage without the need for human intervention.
26, TITLE:?Robust Event Detection Based on Spatio-Temporal Latent Action Unit Using Skeletal Information AUTHORS: Hao Xing ; Yuxuan Xue ; Mingchuan Zhou ; Darius Burschka CATEGORY: cs.CV [cs.CV, cs.HC, I.5.1; I.5.2; I.5.3] HIGHLIGHT: This paper propose a novel dictionary learning approach to detect event action using skeletal information extracted from RGBD video.
27, TITLE:?CTRL-C: Camera Calibration TRansformer with Line-Classification AUTHORS: JINWOO LEE et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this work, we propose Camera calibration TRansformer with Line-Classification (CTRL-C), an end-to-end neural network-based approach to single image camera calibration, which directly estimates the camera parameters from an image and a set of line segments.
28, TITLE:?Self-supervised Product Quantization for Deep Unsupervised Image Retrieval AUTHORS: Young Kyun Jang ; Nam Ik Cho CATEGORY: cs.CV [cs.CV] HIGHLIGHT: To tackle these issues, we propose the first deep unsupervised image retrieval method dubbed Self-supervised Product Quantization (SPQ) network, which is label-free and trained in a self-supervised manner.
29, TITLE:?Underwater 3D Reconstruction Using Light Fields AUTHORS: Yuqi Ding ; Yu Ji ; Jingyi Yu ; Jinwei Ye CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we present an underwater 3D reconstruction solution using light field cameras.
30, TITLE:?Image Recognition Via Vietoris-Rips Complex AUTHORS: Yasuhiko Asao ; Jumpei Nagase ; Ryotaro Sakamoto ; Shiro Takagi CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we propose a way to extract such features from images by a method based on algebraic topology.
31, TITLE:?Learning to Generate Scene Graph from Natural Language Supervision AUTHORS: Yiwu Zhong ; Jing Shi ; Jianwei Yang ; Chenliang Xu ; Yin Li CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
32, TITLE:?GDP: Stabilized Neural Network Pruning Via Gates with Differentiable Polarization AUTHORS: YI GUO et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In view of the research gaps, we present a new module named Gates with Differentiable Polarization (GDP), inspired by principled optimization ideas.
33, TITLE:?GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph AUTHORS: Zhixuan Zhang ; Chi Zhang ; Zhenning Niu ; Le Wang ; Yuehu Liu CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this manuscript, we introduce a semi-automatic scene graph annotation tool for images, the GeneAnnotator.
34, TITLE:?Reasoning Graph Networks for Kinship Verification: from Star-shaped to Hierarchical AUTHORS: Wanhua Li ; Jiwen Lu ; Abudukelimu Wuerkaixi ; Jianjiang Feng ; Jie Zhou CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we investigate the problem of facial kinship verification by learning hierarchical reasoning graph networks.
35, TITLE:?Learning Fine-Grained Motion Embedding for Landscape Animation AUTHORS: HONGWEI XUE et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper we focus on landscape animation, which aims to generate time-lapse videos from a single landscape image. To train and evaluate on diverse time-lapse videos, we build the largest high-resolution Time-lapse video dataset with Diverse scenes, namely Time-lapse-D, which includes 16,874 video clips with over 10 million frames.
36, TITLE:?From Contexts to Locality: Ultra-high Resolution Image Segmentation Via Locality-aware Contextual Correlation AUTHORS: Qi Li ; Weixiang Yang ; Wenxi Liu ; Yuanlong Yu ; Shengfeng He CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we innovate the widely used high-resolution image segmentation pipeline, in which an ultra-high resolution image is partitioned into regular patches for local segmentation and then the local results are merged into a high-resolution semantic mask.
37, TITLE:?3D Human Texture Estimation from A Single Image with Transformers AUTHORS: Xiangyu Xu ; Chen Change Loy CATEGORY: cs.CV [cs.CV, cs.AI, cs.GR, cs.LG, cs.MM] HIGHLIGHT: We propose a Transformer-based framework for 3D human texture estimation from a single image.
38, TITLE:?Weakly Supervised Few-Shot Segmentation Via Meta-Learning AUTHORS: Pedro H. T. Gama ; Hugo Oliveira ; Jos� Marcato Junior ; Jefersson A. dos Santos CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: In this paper, we present two novel meta learning methods, named WeaSeL and ProtoSeg, for the few-shot semantic segmentation task with sparse annotations.
39, TITLE:?Revisiting 3D ResNets for Video Recognition AUTHORS: XIANZHI DU et. al. CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV] HIGHLIGHT: We propose a simple scaling strategy for 3D ResNets, in combination with improved training strategies and minor architectural changes.
40, TITLE:?Comparing The Machine Readability of Traffic Sign Pictograms in Austria and Germany AUTHORS: Alexander Maletzky ; Stefan Thumfart ; Christoph Wru� CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: To that end, we train classification models on synthetic data sets and evaluate their classification accuracy in a controlled setting.
41, TITLE:?Point-Based Neural Rendering with Per-View Optimization AUTHORS: Georgios Kopanas ; Julien Philip ; Thomas Leimk�hler ; George Drettakis CATEGORY: cs.CV [cs.CV, cs.GR] HIGHLIGHT: We introduce a general approach that is initialized with MVS, but allows further optimization of scene properties in the space of input views, including depth and reprojected features, resulting in improved novel-view synthesis.
42, TITLE:?Improved RAMEN: Towards Domain Generalization for Visual Question Answering AUTHORS: Bhanuka Manesha Samarasekara Vitharana Gamage ; Lim Chern Hong CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: This study provides two major improvements to the early/late fusion module and aggregation module of the RAMEN architecture, with the objective of further strengthening domain generalization.
43, TITLE:?Visual Recognition with Deep Learning from Biased Image Datasets AUTHORS: Robin Vogel ; Stephan Cl�men�on ; Pierre Laforgue CATEGORY: cs.CV [cs.CV, cs.CY, cs.LG, stat.ML] HIGHLIGHT: In this paper, we show how biasing models, originally introduced for nonparametric estimation in (Gill et al., 1988), and recently revisited from the perspective of statistical learning theory in (Laforgue and Cl\'emen\c{c}on, 2019), can be applied to remedy these problems in the context of visual recognition.
44, TITLE:?Tensor Normalization and Full Distribution Training AUTHORS: Wolfgang Fuhl CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: In this work, we introduce pixel wise tensor normalization, which is inserted after rectifier linear units and, together with batch normalization, provides a significant improvement in the accuracy of modern deep neural networks.
45, TITLE:?Information Theory-Guided Heuristic Progressive Multi-View Coding AUTHORS: JIANGMENG LI et. al. CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG] HIGHLIGHT: Guided by it, we build a multi-view coding method with a three-tier progressive architecture, namely Information theory-guided heuristic Progressive Multi-view Coding (IPMC).
46, TITLE:?PR-Net: Preference Reasoning for Personalized Video Highlight Detection AUTHORS: RUNNAN CHEN et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we propose a simple yet efficient preference reasoning framework (PR-Net) to explicitly take the diverse interests into account for frame-level highlight prediction.
47, TITLE:?Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation AUTHORS: ZINIU WAN et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework.
48, TITLE:?ERA: Entity Relationship Aware Video Summarization with Wasserstein GAN AUTHORS: Guande Wu ; Jianzhe Lin ; Claudio T. Silva CATEGORY: cs.CV [cs.CV] HIGHLIGHT: This paper proposes a novel Entity relationship Aware video summarization method (ERA) to address the above problems.
49, TITLE:?Image In Painting Applied to Art Completing Escher's Print Gallery AUTHORS: Lucia Cipolina-Kun ; Simone Caenazzo ; Gaston Mazzei ; Aditya Srinivas Menon CATEGORY: cs.CV [cs.CV, cs.AI] HIGHLIGHT: We introduce M.C Eschers Print Gallery lithography as a use case example.
50, TITLE:?Fast Image-Anomaly Mitigation for Autonomous Mobile Robots AUTHORS: Gianmario Fumagalli ; Yannick Huber ; Marcin Dymczyk ; Roland Siegwart ; Renaud Dub� CATEGORY: cs.CV [cs.CV, cs.AI, cs.RO] HIGHLIGHT: In this work we address this importantissue by implementing a pre-processing step that can effectivelymitigate such artifacts in a real-time fashion, thus supportingthe deployment of autonomous systems with limited computecapabilities.
51, TITLE:?To Be Critical: Self-Calibrated Weakly Supervised Learning for Salient Object Detection AUTHORS: Yongri Piao ; Jian Wang ; Miao Zhang ; Zhengxuan Ma ; Huchuan Lu CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this work, 1) we propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions, liberating the saliency network from error-prone propagation caused by pseudo labels.
52, TITLE:?Robust Mitosis Detection Using A Cascade Mask-RCNN Approach With Domain-Specific Residual Cycle-GAN Data Augmentation AUTHORS: GAUTHIER ROY et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: For the MIDOG mitosis detection challenge, we created a cascade algorithm consisting of a Mask-RCNN detector, followed by a classification ensemble consisting of ResNet50 and DenseNet201 to refine detected mitotic candidates.
53, TITLE:?Bridging The Gap Between Events and Frames Through Unsupervised Domain Adaptation AUTHORS: Nico Messikommer ; Daniel Gehrig ; Mathias Gehrig ; Davide Scaramuzza CATEGORY: cs.CV [cs.CV] HIGHLIGHT: To overcome this drawback, we propose a task transfer method that allows models to be trained directly with labeled images and unlabeled event data.
54, TITLE:?Moving Object Detection for Event-based Vision Using K-means Clustering AUTHORS: Anindya Mondal ; Mayukhmali Das CATEGORY: cs.CV [cs.CV, cs.AI] HIGHLIGHT: In this paper, we investigate the application of the k-means clustering technique in detecting moving objects in event-based data.
55, TITLE:?Class Semantics-based Attention for Action Detection AUTHORS: DEEPAK SRIDHAR et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we propose a novel attention mechanism, the Class Semantics-based Attention (CSA), that learns from the temporal distribution of semantics of action classes present in an input video to find the importance scores of the encoded features, which are used to provide attention to the more useful encoded features.
56, TITLE:?The Animation Transformer: Visual Correspondence Via Segment Matching AUTHORS: EVAN CASEY et. al. CATEGORY: cs.CV [cs.CV, cs.AI, cs.GR] HIGHLIGHT: To that end, we propose the Animation Transformer (AnT) which uses a transformer-based architecture to learn the spatial and visual relationships between segments across a sequence of images.
57, TITLE:?Seam Carving Detection and Localization Using Two-Stage Deep Neural Networks AUTHORS: Lakshmanan Nataraj ; Chandrakanth Gudavalli ; Tajuddin Manhar Mohammed ; Shivkumar Chandrasekaran ; B. S. Manjunath CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we propose a two-step method to detect and localize seam carved images.
58, TITLE:?Deep Saliency Prior for Reducing Visual Distraction AUTHORS: KFIR ABERMAN et. al. CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG] HIGHLIGHT: We present results on a variety of natural images and conduct a perceptual study to evaluate and validate the changes in viewers' eye-gaze between the original images and our edited results.
59, TITLE:?A Realistic Approach to Generate Masked Faces Applied on Two Novel Masked Face Recognition Data Sets AUTHORS: TUDOR MARE et. al. CATEGORY: cs.CV [cs.CV, cs.LG] HIGHLIGHT: We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images.
60, TITLE:?F3S: Free Flow Fever Screening AUTHORS: KUNAL RAO et. al. CATEGORY: cs.CV [cs.CV, eess.IV] HIGHLIGHT: We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting.
61, TITLE:?Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection AUTHORS: JIAGENG MAO et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds.
62, TITLE:?Navigating The Mise-en-Page: Interpretive Machine Learning Approaches to The Visual Layouts of Multi-Ethnic Periodicals AUTHORS: Benjamin Charles Germain Lee ; Joshua Ortiz Baco ; Sarah H. Salter ; Jim Casey CATEGORY: cs.CV [cs.CV, cs.DL, cs.IR] HIGHLIGHT: This paper presents a computational method of analysis that draws from machine learning, library science, and literary studies to map the visual layouts of multi-ethnic newspapers from the late 19th and early 20th century United States.
63, TITLE:?Fusformer: A Transformer-based Fusion Approach for Hyperspectral Image Super-resolution AUTHORS: Jin-Fan Hu ; Ting-Zhu Huang ; Liang-Jian Deng CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we design a network based on the transformer for fusing the low-resolution hyperspectral images and high-resolution multispectral images to obtain the high-resolution hyperspectral images.
64, TITLE:?Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction AUTHORS: Xiao Tang ; Tianyu Wang ; Chi-Wing Fu CATEGORY: cs.CV [cs.CV] HIGHLIGHT: This paper presents a novel pipeline by decoupling the hand-mesh reconstruction task into three stages: a joint stage to predict hand joints and segmentation; a mesh stage to predict a rough hand mesh; and a refine stage to fine-tune it with an offset mesh for mesh-image alignment.
65, TITLE:?Deep Person Generation: A Survey from The Perspective of Face, Pose and Cloth Synthesis AUTHORS: Tong Sha ; Wei Zhang ; Tong Shen ; Zhoujun Li ; Tao Mei CATEGORY: cs.CV [cs.CV] HIGHLIGHT: More than two hundred papers are covered for a thorough overview, and the milestone works are highlighted to witness the major technical breakthrough.
66, TITLE:?Voxel Transformer for 3D Object Detection AUTHORS: JIAGENG MAO et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: In this paper, we resolve the problem by introducing a Transformer-based architecture that enables long-range relationships between voxels by self-attention.
67, TITLE:?Hierarchical Object-to-Zone Graph for Object Navigation AUTHORS: SIXIAN ZHANG et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: Previous works usually implement deep models to train an agent to predict actions in real-time.
68, TITLE:?Parsing Table Structures in The Wild AUTHORS: RUJIAO LONG et. al. CATEGORY: cs.CV [cs.CV] HIGHLIGHT: For designing such a system, we propose an approach named Cycle-CenterNet on the top of CenterNet with a novel cycle-pairing module to simultaneously detect and group tabular cells into structured tables. Alongside with our Cycle-CenterNet, we also present a large-scale dataset, named Wired Table in the Wild (WTW), which includes well-annotated structure parsing of multiple style tables in several scenes like the photo, scanning files, web pages, \emph{etc.}.
69, TITLE:?Efficient Action Recognition Using Confidence Distillation AUTHORS: Shervin Manzuri Shalmani ; Fei Chiang ; Rong Zheng CATEGORY: cs.CV [cs.CV, cs.AI] HIGHLIGHT: To mitigate both these issues, we propose the confidence distillation framework to teach a representation of uncertainty of the teacher to the student sampler and divide the task of full video prediction between the student and the teacher models.
70, TITLE:?RAMA: A Rapid Multicut Algorithm on GPU AUTHORS: Ahmed Abbas ; Paul Swoboda CATEGORY: cs.DC [cs.DC, cs.CV, cs.DS, cs.LG] HIGHLIGHT: We propose a highly parallel primal-dual algorithm for the multicut (a.k.a. correlation clustering) problem, a classical graph clustering problem widely used in machine learning and computer vision.
71, TITLE:?CodeNeRF: Disentangled Neural Radiance Fields for Object Categories AUTHORS: Wonbong Jang ; Lourdes Agapito CATEGORY: cs.GR [cs.GR, cs.CV, cs.LG] HIGHLIGHT: We conduct experiments on the SRN benchmark, which show that CodeNeRF generalises well to unseen objects and achieves on-par performance with methods that require known camera pose at test time.
72, TITLE:?Sensor Data Augmentation with Resampling for Contrastive Learning in Human Activity Recognition AUTHORS: Jinqiang Wang ; Tao Zhu ; Jingyuan Gan ; Huansheng Ning ; Yaping Wan CATEGORY: cs.HC [cs.HC, cs.CV] HIGHLIGHT: To optimize the effect of contrast learning models, in this paper, we investigate the sampling frequency of sensors and propose a resampling data augmentation method.
73, TITLE:?Improving Joint Learning of Chest X-Ray and Radiology Report By Word Region Alignment AUTHORS: ZHANGHEXUAN JI et. al. CATEGORY: cs.LG [cs.LG, cs.AI, cs.CL, cs.CV, eess.IV] HIGHLIGHT: This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports.
74, TITLE:?Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss AUTHORS: Jung Hyun Lee ; Jihun Yun ; Sung Ju Hwang ; Eunho Yang CATEGORY: cs.LG [cs.LG, cs.CV] HIGHLIGHT: In this work, we propose a novel quantization method for neural networks, Cluster-Promoting Quantization (CPQ) that finds the optimal quantization grids while naturally encouraging the underlying full-precision weights to gather around those quantization grids cohesively during training.
75, TITLE:?Fair Federated Learning for Heterogeneous Face Data AUTHORS: Samhita Kanaparthy ; Manisha Padala ; Sankarshan Damle ; Sujit Gujar CATEGORY: cs.LG [cs.LG, cs.CV, cs.CY] HIGHLIGHT: To resolve this challenge, we propose several aggregation techniques.
76, TITLE:?Sparse-MLP: A Fully-MLP Architecture with Conditional Computation AUTHORS: Yuxuan Lou ; Fuzhao Xue ; Zangwei Zheng ; Yang You CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV] HIGHLIGHT: In this paper, we propose Sparse-MLP, scaling the recent MLP-Mixer model with sparse MoE layers, to achieve a more computation-efficient architecture.
77, TITLE:?Active Learning for Automated Visual Inspection of Manufactured Products AUTHORS: Elena Trajkova ; Jo?e M. Ro?anec ; Paulien Dam ; Bla? Fortuna ; Dunja Mladeni? CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV] HIGHLIGHT: In this research, we compare three active learning approaches and five machine learning algorithms applied to visual defect inspection with real-world data provided by Philips Consumer Lifestyle BV.
78, TITLE:?Multi-Agent Variational Occlusion Inference Using People As Sensors AUTHORS: Masha Itkina ; Ye-Ji Mun ; Katherine Driggs-Campbell ; Mykel J. Kochenderfer CATEGORY: cs.RO [cs.RO, cs.AI, cs.CV, cs.LG, cs.MA, I.2.9; I.2.10] HIGHLIGHT: We propose an occlusion inference method that characterizes observed behaviors of human agents as sensor measurements, and fuses them with those from a standard sensor suite.
79, TITLE:?Navigational Path-Planning For All-Terrain Autonomous Agricultural Robot AUTHORS: Vedant Ghodke CATEGORY: cs.RO [cs.RO, cs.CV] HIGHLIGHT: This report paper compares novel algorithms for autonomous navigation of farmlands.
80, TITLE:?Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks AUTHORS: Russell Sammut Bonnici ; Charalampos Saitis ; Martin Benning CATEGORY: cs.SD [cs.SD, cs.AI, cs.CV, cs.LG, eess.AS] HIGHLIGHT: This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality.
81, TITLE:?Generative Models Improve Radiomics Performance in Different Tasks and Different Datasets: An Experimental Study AUTHORS: Junhua Chen ; Inigo Bermejo ; Andre Dekker ; Leonard Wee CATEGORY: q-bio.QM [q-bio.QM, cs.CV, eess.IV] HIGHLIGHT: In this article, we investigate the possibility of using deep learning generative models to improve the performance of radiomics from low dose CTs.
82, TITLE:?Predicting Isocitrate Dehydrogenase Mutationstatus in Glioma Using Structural Brain Networksand Graph Neural Networks AUTHORS: YIRAN WEI et. al. CATEGORY: eess.IV [eess.IV, cs.CV, q-bio.NC] HIGHLIGHT: Here we propose a method to predict the IDH mutation using GNN, based on the structural brain network of patients.
83, TITLE:?A Privacy-Preserving Image Retrieval Scheme Using A Codebook Generated From Independent Plain-Image Dataset AUTHORS: Kenta Iida ; Hitoshi Kiya CATEGORY: eess.IV [eess.IV, cs.CV, cs.MM] HIGHLIGHT: In this paper, we propose a privacy-preserving image-retrieval scheme using a codebook generated by using a plain-image dataset.
84, TITLE:?OCTAVA: An Open-source Toolbox for Quantitative Analysis of Optical Coherence Tomography Angiography Images AUTHORS: GAVRIELLE R. UNTRACHT et. al. CATEGORY: eess.IV [eess.IV, cs.CV, q-bio.TO] HIGHLIGHT: With the goal of contributing to standardization of OCTA data analysis, we report a user-friendly, open-source toolbox, OCTAVA (OCTA Vascular Analyzer), to automate the pre-processing, segmentation, and quantitative analysis of en face OCTA maximum intensity projection images in a standardized workflow.
85, TITLE:?Right Ventricular Segmentation from Short- and Long-Axis MRIs Via Information Transition AUTHORS: Lei Li ; Wangbin Ding ; Liqun Huang ; Xiahai Zhuang CATEGORY: eess.IV [eess.IV, cs.CV] HIGHLIGHT: In this work, we propose an automatic RV segmentation framework, where the information from long-axis (LA) views is utilized to assist the segmentation of short-axis (SA) views via information transition.
86, TITLE:?Recognition of COVID-19 Disease Utilizing X-Ray Imaging of The Chest Using CNN AUTHORS: Md Gulzar Hussain ; Ye Shiren CATEGORY: eess.IV [eess.IV, cs.CV] HIGHLIGHT: The goal of this research is to assess the convolutional neural networks (CNNs) to diagnosis COVID-19 utisizing X-ray images of chest.
87, TITLE:?Automated Cardiac Resting Phase Detection Targeted on The Right Coronary Artery AUTHORS: SEUNG SU YOON et. al. CATEGORY: eess.IV [eess.IV, cs.CV, physics.med-ph] HIGHLIGHT: The purpose of this work is to propose a fully automated framework that allows the detection of the right coronary artery (RCA) RP within CINE series.
88, TITLE:?Automatic Segmentation of The Optic Nerve Head Region in Optical Coherence Tomography: A Methodological Review AUTHORS: RITA MARQUES et. al. CATEGORY: eess.IV [eess.IV, cs.CV, physics.med-ph] HIGHLIGHT: This review summarizes the current state-of-the-art in automatic segmentation of the ONH in OCT.
89, TITLE:?Deep Learning Facilitates Fully Automated Brain Image Registration of Optoacoustic Tomography and Magnetic Resonance Imaging AUTHORS: YEXING HU et. al. CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG] HIGHLIGHT: Here we propose a fully automated registration method for MSOT-MRI multimodal imaging empowered by deep learning.
90, TITLE:?(M)SLAe-Net: Multi-Scale Multi-Level Attention Embedded Network for Retinal Vessel Segmentation AUTHORS: Shreshth Saini ; Geetika Agrawal CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG] HIGHLIGHT: In this work, we propose a multi-scale, multi-level attention embedded CNN architecture ((M)SLAe-Net) to address the issue of multi-stage processing for robust and precise segmentation of retinal vessels.
91, TITLE:?Evaluation of Convolutional Neural Networks for COVID-19 Classification on Chest X-Rays AUTHORS: FELIPE ANDR� ZEISER et. al. CATEGORY: eess.IV [eess.IV, cs.CV] HIGHLIGHT: In this paper, we propose the evaluation of convolutional neural networks to identify pneumonia due to COVID-19 in XR.
92, TITLE:?A Decoupled Uncertainty Model for MRI Segmentation Quality Estimation AUTHORS: Richard Shaw ; Carole H. Sudre ; Sebastien Ourselin ; M. Jorge Cardoso ; Hugh G. Pemberton CATEGORY: eess.IV [eess.IV, cs.CV] HIGHLIGHT: We aim to automate the process using a probabilistic network that estimates segmentation uncertainty through a heteroscedastic noise model, providing a measure of task-specific quality.
93, TITLE:?Image Compression with Recurrent Neural Network and Generalized Divisive Normalization AUTHORS: Khawar Islam ; L. Minh Dang ; Sujin Lee ; Hyeonjoon Moon CATEGORY: eess.IV [eess.IV, cs.CV, cs.MM] HIGHLIGHT: In this paper, two effective novel blocks are developed: analysis and synthesis block that employs the convolution layer and Generalized Divisive Normalization (GDN) in the variable-rate encoder and decoder side.
94, TITLE:?Multi-View Spatial-Temporal Graph Convolutional Networks with Domain Generalization for Sleep Stage Classification AUTHORS: ZIYU JIA et. al. CATEGORY: eess.SP [eess.SP, cs.AI, cs.CV, cs.LG] HIGHLIGHT: To address the above challenges, we propose a multi-view spatial-temporal graph convolutional networks (MSTGCN) with domain generalization for sleep stage classification.
|