Memory Based Online Learning of Deep Representations from Video Streams

Pernici, Federico; Bartoli, Federico; Bruni, Matteo; Bimbo, Alberto Del

doi:10.1109/cvpr.2018.00247

Cited by 28 publications

(20 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sharma et al [67] used instead a Recurrent Rolling Convolution (RRC) CNN [68] and a SubCNN [69] to detect vehicles in videos recorded on a moving camera in the context of autonomous driving (see section 3.2.4). Pernici et al [70] used the Tiny CNN detector [71] in their face tracking algorithm, obtaining a better performance when compared to the Deformable Parts Model detector (DPM) [25], that does not use deep learning techniques.…”

Section: Other Detectorsmentioning

confidence: 99%

“…The authors in [100] used a fine-tuned GoogLeNet on the ILSVRC CLS-LOC [101] dataset for pedestrians recognition. In [70], the authors reused the visual features extracted by the CNN-based detector, and the association was performed using a Reverse Nearest Neighbor technique [102]. Sheng et al [103] employed the convolutional part of GoogLeNet to extract appearance features, using the cosine distance between them to compute an affinity score between pairs of detections, and merging that information with motion prediction in order to compute an overall affinity which serves as edge cost in a graph problem.…”

Section: Cnns As Visual Feature Extractorsmentioning

confidence: 99%

See 1 more Smart Citation

Deep learning in video multi-object tracking: A survey

et al. 2020

View full text Add to dashboard Cite

The problem of Multiple Object Tracking (MOT) consists in following the trajectory of different objects in a sequence, usually a video. In recent years, with the rise of Deep Learning, the algorithms that provide a solution to this problem have benefited from the representational power of deep models. This paper provides a comprehensive survey on works that employ Deep Learning models to solve the task of MOT on single-camera videos. Four main steps in MOT algorithms are identified, and an in-depth review of how Deep Learning was employed in each one of these stages is presented. A complete experimental comparison of the presented works on the three MOTChallenge datasets is also provided, identifying a number of similarities among the top-performing methods and presenting some possible future research directions.In this section, a general description about the problem of MOT is provided. The main characteristics and common steps of MOT algorithms are identified and described in section 2.1. The metrics that are usually employed to evaluate the performance of the models are discussed in section 2.2, while the most important benchmark datasets are presented in section 2.3. 7 The website says the detections were obtained using a model based on a latent SVM, or L-SVM. That model is now known as Deformable Parts Model (DPM). 8

show abstract

Section: Other Detectorsmentioning

confidence: 99%

Section: Cnns As Visual Feature Extractorsmentioning

confidence: 99%

Deep learning in video multi-object tracking: A survey

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Their approach extends Nearest Class Mean classifier to operate in an open world setting by re-calibrating the class probabilities to balance open space risk. [46] studies open world face identity learning while [63] proposed to use an exemplar set of seen classes to match them against a new sample, and rejects it in case of a low match with all previously known classes. However, they don't test on image classification benchmarks and study product classification in e-commerce applications.…”

Section: Related Workmentioning

confidence: 99%

Towards Open World Object Detection

Joseph¹,

Khan²,

Khan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This motivates us to propose a novel computer vision problem called: 'Open World Object Detection', where a model is tasked to: 1) identify objects that have not been introduced to it as 'unknown', without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received. We formulate the problem, introduce a strong evaluation protocol and provide a novel solution, which we call ORE: Open World Object Detector, based on contrastive clustering and energy based unknown identification. Our experimental evaluation and ablation studies analyse the efficacy of ORE in achieving Open World objectives. As an interesting by-product, we find that identifying and characterising unknown instances helps to reduce confusion in an incremental object detection setting, where we achieve state-ofthe-art performance, with no extra methodological effort. We hope that our work will attract further research into this newly identified, yet crucial research direction. 1

show abstract

“…However, having redundant visual exemplars not only slows down the tracker, but also makes the tracker become biased and eventually drift away from the target. Therefore, we adopt the reverse nearest neighbor algorithm [22,32] and add Z t to Z if the reverse nearest neighbor set of Z t with Z is an empty set. The rationale is that we add Z t to Z only if the new exemplar "looks" different to its past, and therefore the memory captures the temporal appearance variations of the target.…”

Section: Memory Management Mechanism (Mmm)mentioning

confidence: 99%

Real-time Visual Object Tracking with Natural Language Description

Feng

Ablavsky

Bai

et al. 2020

2020 IEEE Winter Conference on Applications of Computer Vision (WACV)

View full text Add to dashboard Cite

Tracking with natural-language (NL) specification is a powerful new paradigm to yield trackers that initialize without a manuallyspecified bounding box, stay on target in spite of occlusions, and auto-recover when diverged. These advantages stem in part from visual appearance and NL having distinct and complementary invariance properties. However, realizing these advantages is technically challenging: the two modalities have incompatible representations. In this paper, we present the first practical and competitive solution to the challenge of tracking with NL specification. Our first novelty is an NL region proposal network (NL-RPN) that transforms an NL description into a convolutional kernel and shares the search branch with siamese trackers; the combined network can be trained end-to-end. Secondly, we propose a novel formulation to represent the history of past visual exemplars and use those exemplars to automatically reset the tracker together with our NL-RPN. Empirical results over tracking benchmarks with NL annotations demonstrate the effectiveness of our approach.

show abstract

Memory Based Online Learning of Deep Representations from Video Streams

Cited by 28 publications

References 56 publications

Deep learning in video multi-object tracking: A survey

Deep learning in video multi-object tracking: A survey

Towards Open World Object Detection

Real-time Visual Object Tracking with Natural Language Description

Contact Info

Product

Resources

About