Yazhou Yao scite author profile

In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e., DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts. Code is available at: https://github.com/tfzhou/MATNet.

show abstract

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

Yao

Chen

Xie

et al. 2021

137

View full text Add to dashboard Cite

Region Graph Embedding Network for Zero-Shot Learning

Xie

Liu

Zhu

et al. 2020

View full text Add to dashboard Cite

Jo-SRC: A Contrastive Approach for Combating Noisy Labels

Yao

Sun

Zhang

et al. 2021

View full text Add to dashboard Cite

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

Sun

Yao

Wei

et al. 2021

View full text Add to dashboard Cite

Automatic image dataset construction with multiple textual metadata

Yao

Zhang

Shen

et al. 2016

View full text Add to dashboard Cite

The goal of this work is to automatically collect a large number of highly relevant images from the Internet for given queries. A novel image dataset construction framework is proposed by employing multiple textual metadata. In specific, the given query is first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic description, from which the visually non-salient and less relevant expansions are then filtered. After retrieving the relevant images from the Internet, we further filter these noisy images by clustering and progressively Convolutional Neural Networks (CNN). To verify the effectiveness of our proposed method, we construct a dataset with 10 categories, which is not only much larger than but also have comparable cross-dataset generalization ability with the manually labelled dataset STL-10 and CIFAR-10. What's more, our method achieves a higher average precision than previous works.

show abstract

Exploiting Web Images for Dataset Construction: A Domain Robust Approach

Yao

Zhang

Shen

et al. 2017

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Labelled image datasets have played a critical role in high-level image understanding. However, the process of manual labelling is both time-consuming and labor intensive. To reduce the cost of manual labelling, there has been increased research interest in automatically constructing image datasets by exploiting web images. Datasets constructed by existing methods tend to have a weak domain adaptation ability, which is known as the "dataset bias problem". To address this issue, we present a novel image dataset construction framework that can be generalized well to unseen target domains. Specifically, the given queries are first expanded by searching the Google Books Ngrams Corpus to obtain a rich semantic description, from which the visually non-salient and less relevant expansions are filtered out. By treating each selected expansion as a "bag" and the retrieved images as "instances", image selection can be formulated as a multi-instance learning problem with constrained positive bags. We propose to solve the employed problems by the cutting-plane and concave-convex procedure (CCCP) algorithm. By using this approach, images from different distributions can be kept while noisy images are filtered out. To verify the effectiveness of our proposed approach, we build an image dataset with 20 categories. Extensive experiments on image classification, cross-dataset generalization, diversity comparison and object detection demonstrate the domain robustness of our dataset.Comment: Journa

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yazhou Yao

Attentive Region Embedding Network for Zero-Shot Learning

Motion-Attentive Transition for Zero-Shot Video Object Segmentation

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

Region Graph Embedding Network for Zero-Shot Learning

Jo-SRC: A Contrastive Approach for Combating Noisy Labels

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

Automatic image dataset construction with multiple textual metadata

Exploiting Web Images for Dataset Construction: A Domain Robust Approach

Contact Info

Product

Resources

About