Active learning (AL) is designed to construct a highquality labeled dataset by iteratively selecting the most informative samples. Such sampling heavily relies on data representation, while recently pre-training is popular for robust feature learning. However, as pre-training utilizes low-level pretext tasks that lack annotation, directly using pre-trained representation in AL is inadequate for determining the sampling score. To address this problem, we propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance for selecting diverse and instructive samples near the decision boundary. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. The diversity indicator constructs two feature spaces based on the pre-training pretext model and the downstream knowledge from annotation, by which it locates the neighbors of unlabeled data from the downstream space in the pretext space to explore the interaction of samples. With this mechanism, DOKT unifies the data relations of low-level and high-level representations to estimate traceback diversity. Next, in the uncertainty estimator, domain mixing is designed to enforce perceptual perturbing to unlabeled samples with similar visual patches in the pretext space. Then the divergence of perturbed samples is measured to estimate the domain uncertainty. As a result, DOKT selects the most diverse and important samples based on these two modules. The experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods and generalizes well to various application scenarios such as semantic segmentation and image captioning.
Most image captioning models achieve superior performances with the help of large-scale surprised training data, but it is prohibitively costly to label the image captions. To solve this problem, we propose a structural semantic adversarial active learning (SSAAL) model that leverages both visual and textual information for deriving the most representative samples while maximizing the image captioning performance. SSAAL consists of a semantic constructor, a snapshot&caption (SC) supervisor, and a labeled/unlabeled state discriminator. The constructor is designed to generate a structural semantic representation describing the objects, attributes and object relationships in the image. The SC supervisor is proposed to supervise this representation at the word-level and sentence-level in a multi-task learning manner, which directly relates the representation to ground-truth captions and updates it in the caption generating process. Finally, we introduce a state discriminator to predict the sample state and select images with sufficient semantic and fine-grained diversity. Extensive experiments on standard captioning dataset show that our model outperforms other active learning methods and achieves a competitive performance even though selecting a small amount of samples. CCS CONCEPTS • Computing methodologies → Active learning settings; Knowledge representation and reasoning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.