The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we introduce a method to permit domain adaptation to many diverse domains using a computationally efficient adapter approach. Our method is based on the observation that textual domains are partially overlapping, and we represent domains as a hierarchical tree structure where each node in the tree is associated with a set of adapter weights. When combined with a frozen pretrained language model, this approach enables parameter sharing among related domains, while avoiding negative interference between unrelated ones. Experimental results with GPT-2 and a large fraction of the 100 most represented websites in C4 show across-the-board improvements indomain. We additionally provide an inference time algorithm for a held-out domain and show that averaging over multiple paths through the tree enables further gains in generalization, while adding only a marginal cost to inference.
Background: Close relationships in older adulthood are characterized by heightened interdependence, which has implications for health and well-being as partners age together. Purpose: We describe a novel method that uses partners’ spatial proximity to examine the dynamics of interpersonal relationships. Research Design: In a sample of 10 older adult couples over a 14-day study period, we linked a continuous measure of partners’ spatial proximity with partners’ heart rates—a physiological marker of arousal. Results: Cross-correlations showed that proximity was consistently associated with each partner’s heart rate, but the magnitude and sequence of the correlation varied from day-to-day, suggesting that the coupling of proximity and heart rate is a dynamic of the interaction, rather than the couple. Additionally, our predictive model showed that all three time-series were necessary for optimal prediction, demonstrating that proximity and partners’ heart rates are dynamically intertwined. Conclusion: Together, these results demonstrate meaningful and predictable variation in couple dynamics at the momentary level that consists of a complex association between physiological and spatial proximity.
Using a language model (LM) pretrained on two languages with large monolingual data in order to initialize an unsupervised neural machine translation (UNMT) system yields stateof-the-art results. When limited data is available for one language, however, this method leads to poor translations. We present an effective approach that reuses an LM that is pretrained only on a high-resource language. The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model. To reuse the pretrained LM, we have to modify its predefined vocabulary, to account for the new language. We therefore propose a novel vocabulary extension method. Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq), yielding more than +8.3 BLEU points for all four translation directions. 1 We release the code in https://github.com/ alexandra-chron/relm_unmt.
Successful methods for unsupervised neural machine translation (UNMT) employ crosslingual pretraining via self-supervision, often in the form of a masked language modeling or a sequence generation task, which requires the model to align the lexical-and high-level representations of the two languages. While cross-lingual pretraining works for similar languages with abundant corpora, it performs poorly in low-resource, distant languages. Previous research has shown that this is because the representations are not sufficiently aligned. In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings. Empirical results demonstrate improved performance both on UNMT (up to 4.5 BLEU) and bilingual lexicon induction using our method compared to an established UNMT baseline.
Engineering at the University of Illinois has steadily drawn increasingly larger incoming classes of students. With a significant population and a diverse number of programs, these freshmen will be entering a unique engineering culture. This study sought to understand the perspectives and experiences of the students in regards to their engineering identity as they entered the university in the Fall of 2017. Differences in perceptions among demographics such as gender, ethnicity, and the different engineering majors were also examined.A survey was administered to 1986 freshman engineers within their first month of school. The survey contained questions pertaining to the students' perceived understanding of and confidence in engineering, as well as their reasons for pursuing engineering. Common perceptions of engineering qualities and responsibilities were also assessed. Based on survey results with a 23.3% response rate, students across all majors were confident in their ability to succeed, but female students reported lower levels of confidence than male students. The most common reasons students selected for pursuing engineering were their abilities in math and science, followed by having prior experience with engineering. However, female participants selected prior experiences as a reason at a significantly lower level than their male counterparts. Within the various engineering majors and programs themselves, there were differences in satisfaction levels. Students who were not in their first choice major were less likely to agree with being happy in their field or intending to stay in their major. However, overall the participants rated themselves as having a good understanding of engineering and planning to stay within engineering as a realm. Descriptors for engineers that were most commonly selected included 'Practical' and 'Analytical' while less commonly selected were 'Artistic' and 'Kind'. A brief description of a follow up study is provided.
Generative language models are trained on diverse, general domain corpora. However, this limits their applicability to narrower domains, and prior work has shown that continued indomain training can provide further gains. In this paper, we introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach. Our method is based on the observation that textual domains are partially overlapping, and we represent domains as a hierarchical tree structure where each node in the tree is associated with a set of adapter weights. When combined with a frozen pretrained language model, this approach enables parameter sharing among related domains, while avoiding negative interference between unrelated ones. It is efficient and computational cost scales as O(log(D)) for D domains. Experimental results with GPT-2 and a large fraction of the 100 most represented websites in C4 show acrossthe-board improvements in-domain. We additionally provide an inference time algorithm for a held-out domain and show that averaging over multiple paths through the tree enables further gains in generalization, while adding only a marginal cost to inference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.