This paper explores a methodology for bias quantification in transformer-based deep neural network language models for Chinese, English, and French. When queried with health-related mythbusters on COVID-19, we observe a bias that is not of a semantic/encyclopaedical knowledge nature, but rather a syntactic one, as predicted by theoretical insights of structural complexity. Our results highlight the need for the creation of health-communication corpora as training sets for deep learning.
My foremost thanks go to my mentors, Luigi Rizzi and Adriana Belletti. My journey to the Left Periphery started in Siena when I was a master's student. I say grazie for these wonderful years in Geneva working under the ERC Advanced Grant 340297 ("SynCart"), which supported my doctoral studies. I would also like to say תודה to Ur Shlonsky for his help and the beautiful conversations about linguistic and non-linguistic subjects. I am thankful for the members of the jury who honoured me to discuss this work: grazie Cecilia Poletto, danke Elisabeth Stark and Eric Haeberli. I gratefully acknowledge all the Linguistics department of Geneva: Giuliano Bocci, Paola Merlo, Luka Nerima, Genoveva Puskás, inter alia. A special köszönöm goes to Eva Capitao. I am grateful to my colleagues, with whom I discussed my (still ongoing
The ability of assessing any type of linguistic complexity of any given contents could potentially improve knowledge reproduction, especially tacit knowledge which can be expensive during a pandemic. In this paper, we develop a simple and crosslinguistic model of complexity which considers formal accounts on the study of linguistic systems, but can be easily implemented by non-linguists’ groups, e.g., communication experts and policymakers. To test our model, we conduct a study on a corpus extracted from the World Health Organization (WHO)’s emergency learning platform in 6 languages. Data extracted from open-access encyclopaedic entries act as control groups. The results show that the measurements adopted signal a trend for a minimization of complexity and can be exploited as features for (automatic) text classification.
Discontinous dependencies are one of the hallmarks of human languages. The investigation of the locality constraints imposed on such long-distance dependencies is a core aspect of syntactic explanations. The aim of this work is to investigate locality constraints in object relative clauses adopting a theory-driven and quantitative point of view. Based on a comparison of the theoretically expected and the observed counts of features of object relative clauses, we study which set of features plays a role in the syntactic computation of locality (type, number, animacy). We find both effects predicted by a narrow and a broad view of intervention locality. For example, in Italian the feature number triggers a numerically stronger effect than in English, a prediction of the narrow, grammar-driven view of locality. We also find that the feature animacy plays a role in the frequency of object relative clauses, an effect predicted by a broader view of locality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.