Object-aware Contrastive Learning for Debiased Scene Representation

Mo, Sangwoo; Kang, Hyunwoo; Sohn, Kihyuk; Li, Chun-Liang; Shin, Jinwoo

doi:10.48550/arxiv.2108.00049

Cited by 3 publications

(2 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some contrasting learning methods [ 41 , 42 , 43 ] are proposed to enhance self-supervised learning performance with the contrastive learning [ 42 ] aims to tackle background the bias problem in contrastive learning. Ref.…”

Section: Related Workmentioning

confidence: 99%

Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning

Yoo

Lee

Bae

2022

Sensors

View full text Add to dashboard Cite

Effective multi-object tracking is still challenging due to the trade-off between tracking accuracy and speed. Because the recent multi-object tracking (MOT) methods leverage object appearance and motion models so as to associate detections between consecutive frames, the key for effective multi-object tracking is to reduce the computational complexity of learning both models. To this end, this work proposes global appearance and motion models to discriminate multiple objects instead of learning local object-specific models. In concrete detail, it learns a global appearance model using contrastive learning between object appearances. In addition, we learn a global relation motion model using relative motion learning between objects. Moreover, this paper proposes object constraint learning for improving tracking efficiency. This study considers the discriminability of the models as a constraint, and learns both models when inconsistency with the constraint occurs. Therefore, object constraint learning differs from the conventional online learning for multi-object tracking which updates learnable parameters per frame. This work incorporates global models and object constraint learning into the confidence-based association method, and compare our tracker with the state-of-the-art methods on public available MOT Challenge datasets. As a result, we achieve 64.5% MOTA (multi-object tracking accuracy) and 6.54 Hz tracking speed on the MOT16 test dataset. The comparison results show that our methods can contribute to improve tracking accuracy and tracking speed together.

show abstract

Section: Related Workmentioning

confidence: 99%

Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning

Yoo

Lee

Bae

2022

Sensors

View full text Add to dashboard Cite

show abstract

“…In this aspect, various benchmarks have been proposed to measure the robustness under distribution shifts [9,14,23,25,26,29,30,45,48,50], and this problem has been extensively studied in broad research fields [3,4,10,15,16,24,38,39,40,43,52,55,62]. Among them, benchmarking robustness [23] and resolving scene bias [10,42] or distribution shift [43,59] are the most related to our problem setup. Different from the aforementioned works, we first explore the background shift issue in the CSLR task with a newly synthesized benchmark.…”

Section: Related Workmentioning

confidence: 99%

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition

Jang¹,

Oh²,

Cho³

et al. 2022

Preprint

View full text Add to dashboard Cite

The goal of this work is background-robust continuous sign language recognition. Most existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed backgrounds and are filmed in studios with a static monochromatic background. However, signing is not limited only to studios in the real world. In order to analyze the robustness of CSLR models under background shifts, we first evaluate existing state-ofthe-art CSLR models on diverse backgrounds. To synthesize the sign videos with a variety of backgrounds, we propose a pipeline to automatically generate a benchmark dataset utilizing existing CSLR benchmarks. Our newly constructed benchmark dataset consists of diverse scenes to simulate a real-world environment. We observe even the most recent CSLR method cannot recognize glosses well on our new dataset with changed backgrounds. In this regard, we also propose a simple yet effective training scheme including (1) background randomization and (2) feature disentanglement for CSLR models. The experimental results on our dataset demonstrate that our method generalizes well to other unseen background data with minimal additional training images. Our dataset is available here.

show abstract

PreViTS: Contrastive Pretraining with Video Tracking Supervision

Chen¹,

Selvaraju²,

Chang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Videos are a rich source for self-supervised learning (SSL) of visual representations due to the presence of natural temporal transformations of objects. However, current methods typically randomly sample video clips for learning, which results in a poor supervisory signal. In this work, we propose PreViTS, an SSL framework that utilizes an unsupervised tracking signal for selecting clips containing the same object, which helps better utilize temporal transformations of objects. PreViTS further uses the tracking signal to spatially constrain the frame regions to learn from and trains the model to locate meaningful objects by providing supervision on Grad-CAM attention maps. To evaluate our approach, we train a momentum contrastive (MoCo) encoder on VGG-Sound and Kinetics-400 datasets with Pre-ViTS. Training with PreViTS outperforms representations learnt by MoCo alone on both image recognition and video classification downstream tasks, obtaining state-of-the-art performance on action classification. PreViTS helps learn feature representations that are more robust to changes in background and context, as seen by experiments on image and video datasets with background changes. Learning from large-scale uncurated videos with PreViTS could lead to more accurate and robust visual feature representations.

show abstract

Object-aware Contrastive Learning for Debiased Scene Representation

Cited by 3 publications

References 48 publications

Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning

Effective Multi-Object Tracking via Global Object Models and Object Constraint Learning

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition

PreViTS: Contrastive Pretraining with Video Tracking Supervision

Contact Info

Product

Resources

About