While Active Learning (AL) techniques are explored in Neural Machine Translation (NMT), only a few works focus on tackling low annotation budgets where a limited number of sentences can get translated. Such situations are especially challenging and can occur for endangered languages with few human annotators or having cost constraints to label large amounts of data. Although AL is shown to be helpful with large budgets, it is not enough to build high-quality translation systems in these low-resource conditions. In this work, we propose a cost-effective training procedure to increase the performance of NMT models utilizing a small number of annotated sentences and dictionary entries. Our method leverages monolingual data with self-supervised objectives and a small-scale, inexpensive dictionary for additional supervision to initialize the NMT model before applying AL. We show that improving the model using a combination of these knowledge sources is essential to exploit AL strategies and increase gains in lowresource conditions. We also present a novel AL strategy inspired by domain adaptation for NMT and show that it is effective for low budgets. We propose a new hybrid data-driven approach, which samples sentences that are diverse from the labelled data and also most similar to unlabelled data. Finally, we show that initializing the NMT model and further using our AL strategy can achieve gains of up to 13 BLEU compared to conventional AL methods.
The replication crisis poses important challenges to modern science. Central to this challenge is re-establishing ground truths, or the most fundamental theories that serve as the bedrock to a scientific community. However, the goal to identify hypotheses with the greatest support is nontrivial given the current unprecedented rate of scientific publishing. In this era of high-volume science, the goal of this study is to sample from one research community within clinical neuroscience (traumatic brain injury; TBI) and track major trends that have shaped this literature over the past 50 years. To do so, we first conduct decade-wise (1980-2019) network analysis to examine the scientific communities that shape this literature. To establish the robustness of our network findings, we utilized searches from separate search engines (Web of Science; Semantic Scholar). As a second goal, we sought to determine the most highly cited hypotheses influencing the literature in each decade. In a third goal, we then searched for any papers referring to “replication” or efforts to reproduce findings within our >50,000 paper dataset. From this search, 550 papers were analyzed to determine the frequency and nature of formal replication studies over time. Finally, to maximize transparency, we provide a detailed procedure for the creation and analysis of our dataset, including a discussion of each of our major decision points, to facilitate similar efforts in other areas of neuroscience. We found that the unparalleled rate of scientific publishing within the TBI literature combined with the scarcity of clear hypotheses in individual publications are a challenge to both evaluating accepted findings and determining future directions. Additionally, while the conversation about reproducibility has increased over the past decade, the rate of published replication studies continues to be a negligible proportion of the research. Meta-science and computational methods offer the critical opportunity to assess the state of the science and illuminate pathways forward, but ultimately there is structural change needed in the TBI literature and perhapsothers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.