Yong Jae Lee scite author profile

We introduce Spatial-Temporal Memory Networks for video object detection. At its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM's design enables full integration of pretrained backbone CNN weights, which we find to be critical for accurate detection. Furthermore, in order to tackle object motion in videos, we propose a novel MatchTrans module to align the spatial-temporal memory from frame to frame. Our method produces state-of-the-art results on the benchmark ImageNet VID dataset, and our ablative studies clearly demonstrate the contribution of our different design choices. We release our code and models at

show abstract

Learning to Anonymize Faces for Privacy Preserving Action Detection

Ren

Lee

Ryoo³

2018

151

View full text Add to dashboard Cite

Password-Conditioned Anonymization and Deanonymization with Face Identity Transformers

Luo

Ryoo

et al. 2020

View full text Add to dashboard Cite

DOCK: Detecting Objects by Transferring Common-Sense Knowledge

Singh

Divvala

Farhadi

et al. 2018

View full text Add to dashboard Cite

We present a scalable approach for Detecting Objects by transferring Common-sense Knowledge (DOCK) from source to target categories. In our setting, the training data for the source categories have bounding box annotations, while those for the target categories only have image-level annotations. Current state-of-the-art approaches focus on image-level visual or semantic similarity to adapt a detector trained on the source categories to the new target categories. In contrast, our key idea is to (i) use similarity not at the image-level, but rather at the regionlevel, and (ii) leverage richer common-sense (based on attribute, spatial, etc.) to guide the algorithm towards learning the correct detections. We acquire such common-sense cues automatically from readily-available knowledge bases without any extra human effort. On the challenging MS COCO dataset, we find that common-sense knowledge can substantially improve detection performance over existing transfer-learning baselines.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yong Jae Lee

YOLACT++ Better Real-Time Instance Segmentation

Video Object Detection with an Aligned Spatial-Temporal Memory

Learning to Anonymize Faces for Privacy Preserving Action Detection

Password-Conditioned Anonymization and Deanonymization with Face Identity Transformers

DOCK: Detecting Objects by Transferring Common-Sense Knowledge

Contact Info

Product

Resources

About