Advances in Deep Learning Methods for Visual Tracking: Literature Review and Fundamentals

Zhang, Xiaoqin; Jiang, Runhua; Fan, Chenxiang; Tong, Tian-Yu; Wang, Tao; Huang, Pengcheng

doi:10.1007/s11633-020-1274-8

Cited by 16 publications

(3 citation statements)

References 185 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The core objective of DL research is to develop systems capable of learning intricate patterns from data and executing tasks with minimal human intervention. This spectrum of tasks encompasses diverse applications, ranging from automatic speech recognition [91] and multilingual text translation [105] to object tracking in videos [146] and analysis of medical imaging for disease diagnosis [85].…”

Section: Deep Learningmentioning

confidence: 99%

Generalisation and reliability of deep learning for digital pathology in a clinical setting

Pocevičiūtė

2023

Linköping Studies in Science and Technology. Dissertations

View full text Add to dashboard Cite

show abstract

Section: Deep Learningmentioning

confidence: 99%

Generalisation and reliability of deep learning for digital pathology in a clinical setting

Pocevičiūtė

2023

Linköping Studies in Science and Technology. Dissertations

View full text Add to dashboard Cite

show abstract

“…Yang et al [134] proposed a cross-modal relationship extractor (CMRE) to adaptively highlight objects and relationships with a cross-modal attention mechanism, and represented the extracted information as a language-guided visual relation graph. Furthermore, Yang et al [22] proposed a cross-modal relationship extractor to adaptively highlight objects and relationships (spatial and semantic relations) related to the given expression with a cross-modal attention mechanism, and represent the extracted information as a language-guided visual relation graph. Yang et al [150] proposed a scene graph-guided modular network (SGMN), which performed reasoning over a semantic graph and a scene graph with neural modules under the guidance of the linguistic structure of the expression.…”

Section: Visual Representation Learning: Stateof-the-artmentioning

confidence: 99%

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

et al. 2022

View full text Add to dashboard Cite

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multimodal heterogeneous spatial/temporal/spatial-temporal data in the big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks unified guidance and analysis about why modern visual representation learning methods easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in visual representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.

show abstract

“…Object tracking is constantly determining a moving objectʹs trajectory from measurements taken by one or more sensors [1]. Single-object tracking (SOT) [2] and Multi-object tracking (MOT) [3][4][5][6][7] are two main categories of object tracking methods (MOT). When using SOT, the tracker follows a single, predetermined object.…”

Section: Introductionmentioning

confidence: 99%

Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network

Ogunrinde¹,

Bernadin²

2023

Preprint

View full text Add to dashboard Cite

The presence of fog in the background can prevent small and distant objects from being detected, let alone tracked. Under safety-critical conditions, multi-object tracking models require faster-tracking speed while maintaining high object-tracking accuracy. The original DeepSORT algorithm used YOLOv4 for the detection phase, and a simple neural network for deep appearance descriptor. Consequently, the feature map generated loses relevant details about the track being matched with a given detection in fog. Targets with a high degree of appearance similarity on the detection frame are more likely to be mismatched, resulting in identity switches or track failures in heavy fog. We propose an improved multi-object tracking model based on the DeepSORT algorithm to im-prove tracking accuracy and speed under foggy weather conditions. First, we employed our camera-radar fusion network (CR-YOLOnet) in the detection phase for faster and more accurate object detection. We proposed an appearance feature network to replace the basic convolutional neural network. We incorporated GhostNet to take the place of the traditional convolutional layers to generate more features and reduce computational complexities and cost. We adopted a segmentation module and fed the semantic labels of the corresponding input frame to add rich semantic information to the low-level appearance feature maps. Our proposed method outperformed YOLOv5 + DeepSORT with a 35.15% increase in multi-object tracking accuracy, a 32.65% increase in multi-object tracking precision, the speed increased by 37.56%, and identity switches decreased by 46.81%.

show abstract

Advances in Deep Learning Methods for Visual Tracking: Literature Review and Fundamentals

Cited by 16 publications

References 185 publications

Generalisation and reliability of deep learning for digital pathology in a clinical setting

Generalisation and reliability of deep learning for digital pathology in a clinical setting

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

Improved DeepSORT-Based Object Tracking in Foggy Weather for AVs Using Sematic Labels and Fused Appearance Feature Network

Contact Info

Product

Resources

About