Changhu Wang scite author profile

Multiple-object tracking(MOT) is mostly dominated by complex and multi-step tracking-by-detection algorithm, which performs object detection, feature extraction and temporal association, separately. Query-key mechanism in single-object tracking(SOT), which tracks the object of the current frame by object feature of the previous frame, has great potential to set up a simple joint-detectionand-tracking MOT paradigm. Nonetheless, the query-key method is seldom studied due to its inability to detect newcoming objects.In this work, we propose TransTrack, a baseline for MOT with Transformer. It takes advantage of query-key mechanism and introduces a set of learned object queries into the pipeline to enable detecting new-coming objects. TransTrack has three main advantages: (1) It is an online joint-detection-and-tracking pipeline based on querykey mechanism. Complex and multi-step components in the previous methods are simplified. (2) It is a brand new architecture based on Transformer. The learned object query detects objects in the current frame. The object feature query from the previous frame associates those current objects with the previous ones. (3) For the first time, we demonstrate a much simple and effective method based on query-key mechanism and Transformer architecture could achieve competitive 65.8% MOTA on the MOT17 challenge dataset. We hope TransTrack can provide a new perspective for multiple-object tracking. The code is available at: https://github.com/PeizeSun/TransTrack.

show abstract

Generative Dual Adversarial Network for Generalized Zero-Shot Learning

Huang

Wang²,

et al. 2019

209

174

View full text Add to dashboard Cite

This paper studies the problem of generalized zero-shot learning which requires the model to train on image-label pairs from some seen classes and test on the task of classifying new images from both seen and unseen classes. Most previous models try to learn a fixed one-directional mapping between visual and semantic space, while some recently proposed generative methods try to generate image features for unseen classes so that the zero-shot learning problem becomes a traditional fully-supervised classification problem. In this paper, we propose a novel model that provides a unified framework for three different approaches: visual → semantic mapping, semantic → visual mapping, and metric learning. Specifically, our proposed model consists of a feature generator that can generate various visual features given class embedding features as input, a regressor that maps each visual feature back to its corresponding class embedding, and a discriminator that learns to evaluate the closeness of an image feature and a class embedding. All three components are trained under the combination of cyclic consistency loss and dual adversarial loss. Experimental results show that our model not only preserves higher accuracy in classifying images from seen classes, but also performs better than existing state-of-theart models in classifying images from unseen classes.

show abstract

Edgel index for large-scale sketch-based image search

et al. 2011

View full text Add to dashboard Cite

Improving Convolutional Networks With Self-Calibrated Convolutions

Hou²,

et al. 2020

View full text Add to dashboard Cite

Image annotation refinement using random walk with restarts

et al. 2006

View full text Add to dashboard Cite

Multi-label sparse coding for automatic image annotation

et al. 2009

View full text Add to dashboard Cite

show abstract

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Sun¹,

Zhang²,

Jiang³

et al. 2020

Preprint

108

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Changhu Wang

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

TransTrack: Multiple Object Tracking with Transformer

Generative Dual Adversarial Network for Generalized Zero-Shot Learning

Edgel index for large-scale sketch-based image search

Improving Convolutional Networks With Self-Calibrated Convolutions

Image annotation refinement using random walk with restarts

Multi-label sparse coding for automatic image annotation

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Contact Info

Product

Resources

About