Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero-and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, 1 while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models. * Equal contribution. † Work done while at Meta AI. 1 Following Brown et al. (2020), we use GPT-3 to refer to both the 175B model and the smaller scale models as well.2 Exceptions include work by EleutherAI, who released dense models up to 20B in size (Black et al., 2022), Salesforce (Nijkamp et al., 2022), and Meta AI, who released dense models up to 13B and sparse models up to 1. 1T (Artetxe et al., 2021). There is also ongoing work from the BigScience workshop (https://bigscience. huggingface.co/), which aims to open source very large multilingual language models and datasets.
Scaling semantic parsing models for taskoriented dialog systems to new languages is often expensive and time-consuming due to the lack of available datasets. Available datasets suffer from several shortcomings: a) they contain few languages b) they contain small amounts of labeled examples per language c) they are based on the simple intent and slot detection paradigm for non-compositional queries. In this paper, we present a new multilingual dataset, called MTOP, comprising of 100k annotated utterances in 6 languages across 11 domains. We use this dataset and other publicly available datasets to conduct a comprehensive benchmarking study on using various state-of-the-art multilingual pretrained models for task-oriented semantic parsing. We achieve an average improvement of +6.3 points on Slot F1 for the two existing multilingual datasets, over best results reported in their experiments. Furthermore, we demonstrate strong zero-shot performance using pretrained models combined with automatic translation and alignment, and a proposed distant supervision method to reduce the noise in slot label projection.
Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in-and out-of-domain language modeling, zero-and few-shot priming, and full finetuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using ∼4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use. 1 * Equal contribution. Authors listed alphabetically.
OBJECTIVE: To investigate the prevalence of medical adhesive-related skin injuries (MARSIs) and associated risk factors in a pediatric ICU (PICU). METHODS: A cross-sectional design was adopted in the PICU of a university-based children’s hospital in eastern China. A total of 232 patients were enrolled, and 611 person-days were analyzed. MAIN OUTCOME MEASURES: Researchers assessed all patients daily for 2 weeks. The use of adhesives and prevalence of MARSIs were recorded. The patients’ clinical data were also collected. The prevalence of MARSIs was calculated daily, and the risk factors were examined statistically. MAIN RESULTS: The MARSI prevalence ranged from 23.53% to 54.17% (mean, 37.15%). Multivariate analysis identified being female, age 2 years or younger, hospital stays longer than 5 days, infection, edema, and surgery as independent risk factors. Prevalence by product ranged from 19 to 53 per 1,000 product-days with a mean of 34 MARSIs per 1,000 product-days. The major MARSI types were epidermal stripping and skin tear. The face was the most common MARSI site, and tracheal intubation was the most common inciting condition. Implicated products were acrylate tapes with elastic cloth backings. CONCLUSIONS: Researchers concluded that MARSI is common in the PICU. Skin stripping and skin tear were the most common types, and the face was the most vulnerable site for MARSI, typically attributable to the cloth tape used to affix tracheal intubation. Careful attention should be paid to children with identified risk factors (females, age 2 years or younger, longer hospital stays, edema, infection, or surgery).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.