Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Pythonbased natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its tranformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robutstness analysis results are available publicly on the NL-Augmenter repository (https://github. com/GEM-benchmark/NL-Augmenter).
With the growing rates of vaccination against coronavirus disease 2019 (COVID-19) across the globe, rare side effects have been increasingly noticed on a post-marketing basis. Cases of myocarditis and pericarditis have been reported in the literature following COVID messenger RNA (mRNA) vaccination. However, diffuse alveolar hemorrhage (DAH) following vaccination has not been reported. DAH is a life-threatening clinicopathological entity characterized by bleeding into the alveolar space from pulmonary microvasculature. It presents a diagnostic challenge in the setting of acute respiratory failure, requiring prompt suspicion and workup.We report a case of a 59-year-old male with a recent COVID-19 infection who presented with DAH within eight hours of the first dose of mRNA vaccination (Moderna, Cambridge, MA). Bronchial alveolar lavage was performed, along with imaging of the chest, to confirm the diagnosis. Immunological workup with rheumatoid factor, anti-citrullinated peptide, anti-neutrophil cytoplasmic antibodies (P-ANCA and C-ANCA), anti-glomerular basement antibodies, Anti-double-stranded DNA, C3 and C4 complement levels, and cryoglobulin were all negative. Infectious workup with cultures and PCR from bronchial lavage was also negative. In the absence of any other causes, the etiology was likely deemed to be vaccine-induced DAH. Herein, we also discuss the possible mechanism of vaccine-related DAH and emphasize the need for further studies on vaccine-related adverse events.
Data augmentation is an important method for evaluating the robustness of and enhancing the diversity of training data for natural language processing (NLP) models. In this paper, we present NL-Augmenter, a new participatory Python-based natural language (NL) augmentation framework which supports the creation of transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of NL tasks annotated with noisy descriptive tags. The transformations incorporate noise, intentional and accidental human mistakes, socio-linguistic variation, semantically-valid style, syntax changes, as well as artificial constructs that are unambiguous to humans. We demonstrate the efficacy of NL-Augmenter by using its transformations to analyze the robustness of popular language models. We find different models to be differently challenged on different tasks, with quasi-systematic score decreases. The infrastructure, datacards, and robustness evaluation results are publicly available on GitHub for the benefit of researchers working on paraphrase generation, robustness analysis, and low-resource NLP. El aumento de datos es un método importante para evaluar la solidez y mejorar la diversidad del entrenamiento datos para modelos de procesamiento de lenguaje natural (NLP). इस लेख में, हम एनएल-ऑगमेंटर का प्रस्ताव करते हैं - एक नया भागी- दारी पूर्वक, पायथन में बनाया गया, लैंग्वेज (एनएल) ऑग्मेंटेशन फ्रेमवर्क जो ट्रांसफॉर्मेशन (डेटा में बदलाव करना) और फीलटर (फीचर्स के अनुसार डेटा का भाग करना) के नीरमान का समर्थन करता है।. 我们描述了NL-Augmenter框架及其初步包含的117种转换和23个过滤器,并 大致标注分类了一系列可适配的自然语言任务. این دگرگونی ها شامل نویز، اشتباهات عمدی و تصادفی انسانی، تنوع اجتماعی-زبانی، سبک معنایی معتبر، تغییرات نحوی و همچنین ساختارهای مصنوعی است که برای انسان ها مبهم است. NL-Augmenterpa allin kaynintam qawachiyku, tikrakuyninku- nata servichikuspayku, chaywanmi qawariyku modelos de lenguaje popular nisqapa allin takyasqa kayninta. Kami menemukan model yang berbeda ditantang secara berbeda pada tugas yang berbeda, dengan penurunan skor kuasi-sistematis. Infrastruktur, kartu data, dan hasil evaluasi ketahanan dipublikasikan tersedia secara gratis di GitHub untuk kepentingan para peneliti yang mengerjakan pembuatan parafrase, analisis ketahanan, dan NLP sumber daya rendah.
We present a novel approach to intro-to-programming domain model discovery from textbooks using an over-generation and ranking strategy. We first extract candidate key phrases from each chapter in a Computer Science textbook focusing on intro-to-programming and then rank those concepts according to a number of metrics such as the standard tf-idf weight used in information retrieval and metrics produced by other text ranking algorithms. Specifically, we conduct our work in the context of developing an intelligent tutoring system for source code comprehension for which a specification of the key programming concepts is needed - the system monitors students' performance on those concepts and scaffolds their learning process until they show mastery of the concepts. Our experiments with programming concept instruction from Java textbooks indicate that the statistical methods such as KP Miner method are quite competitive compared to other more sophisticated methods. Automated discovery of domain models will lead to more scalable Intelligent Tutoring Systems (ITSs) across topics and domains, which is a major challenge that needs to be addressed if ITSs are to be widely used by millions of learners across many domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.