English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance.
Recent research has demonstrated how racial biases against users who write African American English exists in popular toxic language datasets. While previous work has focused on a single fairness criteria, we propose to use additional descriptive fairness metrics to better understand the source of these biases. We demonstrate that different benchmark classifiers, as well as two in-process bias-remediation techniques, propagate racial biases even in a larger corpus. We then propose a novel ensemble-framework that uses a specialized classifier that is fine-tuned to the African American English dialect. We show that our proposed framework substantially reduces the racial biases that the model learns from these datasets. We demonstrate how the ensemble framework improves fairness metrics across all sample datasets with minimal impact on the classification performance, and provide empirical evidence in its ability to unlearn the annotation biases towards authors who use African American English. ** Please note that this work may contain examples of offensive words and phrases.CCS Concepts: • Computing methodologies → Discourse, dialogue and pragmatics; • Human-centered computing → Empirical studies in collaborative and social computing.
A BFigure 1-Dorsoventral (A) and right lateral (B) radiographic views of the skull of a 12.5-year-old Prevost's squirrel (Callosciurus prevostii) evaluated because of right-sided facial swelling and a recent history of dysphagia and stridor.
History and Physical Examination FindingsA 12.5-year-old sexually intact male Prevost' s squirrel (Callosciurus prevostii) was evaluated because of rightsided facial swelling and a recent history of dysphagia and stridor. The squirrel was anesthetized to allow for physical examination and collection of blood samples. Physical examination revealed a fracture of the maxillary right third premolar tooth and an associated abscess. Results of hematologic and serum biochemical testing were within reference limits for this species. While the squirrel was anesthetized, radiographs of the skull were obtained (Figure 1).Determine whether additional imaging studies are required, or make your diagnosis from Figure 1then turn the page *
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.