Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection

Woo, Sanghyun; Park, Kwanyong; Oh, Seoung Wug; Kweon, In So; Lee, Joon-Young

doi:10.1007/978-3-031-19806-9_14

“…Learning tracking from static images. Since labelled video data is expensive to acquire at scale, recent methods have proposed to use static images to supervise MOT methods [16,66,74,77]. CenterTrack [77] proposes to learn motion offsets from static images by random translation of the input, while FairMOT [74] treats objects in a dataset of static images as unique classes to distinguish.…”

Section: Related Workmentioning

confidence: 99%

OVTrack: Open-Vocabulary Multiple Object Tracking

Li,

Fischer,

Ke

et al. 2023

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

15

0

View full text Add to dashboard Cite

The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited to a small set of pre-defined object categories. In this paper, we address this limitation by tackling a novel task, openvocabulary MOT, that aims to evaluate tracking beyond predefined training categories. We further develop OVTrack, an open-vocabulary tracker that is capable of tracking arbitrary object classes. Its design is based on two key ingredients: First, leveraging vision-language models for both classification and association via knowledge distillation; second, a data hallucination strategy for robust appearance feature learning from denoising diffusion probabilistic models. The result is an extremely data-efficient open-vocabulary tracker that sets a new state-of-the-art on the large-scale, largevocabulary TAO benchmark, while being trained solely on static images.

show abstract

“…Learning tracking from static images. Since labelled video data is expensive to acquire at scale, recent methods have proposed to use static images to supervise MOT methods [18,69,77,80]. CenterTrack [80] proposes to learn motion offsets from static images by random translation of the input, while FairMOT [77] treats objects in a dataset of static images as unique classes to distinguish.…”

Section: Related Workmentioning

confidence: 99%

Tracking Every Thing in the Wild

Liu

¹

,

Danelljan

²

,

Ding

³

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited to a small set of pre-defined object categories. In this paper, we address this limitation by tackling a novel task, openvocabulary MOT, that aims to evaluate tracking beyond predefined training categories. We further develop OVTrack, an open-vocabulary tracker that is capable of tracking arbitrary object classes. Its design is based on two key ingredients: First, leveraging vision-language models for both classification and association via knowledge distillation; second, a data hallucination strategy for robust appearance feature learning from denoising diffusion probabilistic models. The result is an extremely data-efficient open-vocabulary tracker that sets a new state-of-the-art on the large-scale, largevocabulary TAO benchmark, while being trained solely on static images.

show abstract

“…Object Detection. Traditional detection models are trained to detect objects for a pre-defined set of categories [4,52,53,67]. As a result, traditional models find it challenging to adapt to new tasks and domains, unable to differentiate between objects that vary in attributes such as texture, shape, and other characteristics.…”

Section: Related Workmentioning

confidence: 99%

Anomaly detection in particulate matter sensor using hypothesis pruning generative adversarial network

Park

¹

,

Park

²

,

Kim

³

2020

View full text Add to dashboard Cite

The World Health Organization provides guidelines for managing the particulate matter (PM) level because a higher PM level represents a threat to human health. To manage the PM level, a procedure for measuring the PM value is first needed. We use a PM sensor that collects the PM level by laser‐based light scattering (LLS) method because it is more cost effective than a beta attenuation monitor‐based sensor or tapered element oscillating microbalance‐based sensor. However, an LLS‐based sensor has a higher probability of malfunctioning than the higher cost sensors. In this paper, we regard the overall malfunctioning, including strange value collection or missing collection data as anomalies, and we aim to detect anomalies for the maintenance of PM measuring sensors. We propose a novel architecture for solving the above aim that we call the hypothesis pruning generative adversarial network (HP‐GAN). Through comparative experiments, we achieve AUROC and AUPRC values of 0.948 and 0.967, respectively, in the detection of anomalies in LLS‐based PM measuring sensors. We conclude that our HP‐GAN is a cutting‐edge model for anomaly detection.

show abstract

Bridging Images and Videos: A Simple Learning Framework for Large Vocabulary Video Object Detection

Cited by 6 publications

References 81 publications

OVTrack: Open-Vocabulary Multiple Object Tracking

OVTrack: Open-Vocabulary Multiple Object Tracking

Tracking Every Thing in the Wild

Anomaly detection in particulate matter sensor using hypothesis pruning generative adversarial network

Contact Info

Product

Resources

About