In this paper, we present a novel Motion-Attentive Transition Network (MATNet) for zero-shot video object segmentation, which provides a new way of leveraging motion information to reinforce spatio-temporal object representation. An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder, which transforms appearance features into motion-attentive representations at each convolutional stage. In this way, the encoder becomes deeply interleaved, allowing for closely hierarchical interactions between object motion and appearance. This is superior to the typical two-stream architecture, which treats motion and appearance separately in each stream and often suffers from overfitting to appearance information. Additionally, a bridge network is proposed to obtain a compact, discriminative and scale-sensitive representation for multi-level encoder features, which is further fed into a decoder to achieve segmentation results. Extensive experiments on three challenging public benchmarks (i.e., DAVIS-16, FBMS and Youtube-Objects) show that our model achieves compelling performance against the state-of-the-arts. Code is available at: https://github.com/tfzhou/MATNet.
The goal of this work is to automatically collect a large number of highly relevant images from the Internet for given queries. A novel image dataset construction framework is proposed by employing multiple textual metadata. In specific, the given query is first expanded by searching in the Google Books Ngrams Corpora to obtain a richer semantic description, from which the visually non-salient and less relevant expansions are then filtered. After retrieving the relevant images from the Internet, we further filter these noisy images by clustering and progressively Convolutional Neural Networks (CNN). To verify the effectiveness of our proposed method, we construct a dataset with 10 categories, which is not only much larger than but also have comparable cross-dataset generalization ability with the manually labelled dataset STL-10 and CIFAR-10. What's more, our method achieves a higher average precision than previous works.
Labelled image datasets have played a critical role in high-level image
understanding. However, the process of manual labelling is both time-consuming
and labor intensive. To reduce the cost of manual labelling, there has been
increased research interest in automatically constructing image datasets by
exploiting web images. Datasets constructed by existing methods tend to have a
weak domain adaptation ability, which is known as the "dataset bias problem".
To address this issue, we present a novel image dataset construction framework
that can be generalized well to unseen target domains. Specifically, the given
queries are first expanded by searching the Google Books Ngrams Corpus to
obtain a rich semantic description, from which the visually non-salient and
less relevant expansions are filtered out. By treating each selected expansion
as a "bag" and the retrieved images as "instances", image selection can be
formulated as a multi-instance learning problem with constrained positive bags.
We propose to solve the employed problems by the cutting-plane and
concave-convex procedure (CCCP) algorithm. By using this approach, images from
different distributions can be kept while noisy images are filtered out. To
verify the effectiveness of our proposed approach, we build an image dataset
with 20 categories. Extensive experiments on image classification,
cross-dataset generalization, diversity comparison and object detection
demonstrate the domain robustness of our dataset.Comment: Journa
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.