Mitigating Gender Bias in Natural Language Processing: Literature Review

Sun, Tony; Gaut, Andrew; Tang, Shirlyn; Huang, Yuxin; ElSherief, Mai; Zhao, Jieyu; Mirza, Diba; Belding, Elizabeth; Chang, Kai-Wei; Wang, William Yang

doi:10.18653/v1/p19-1159

Cited by 317 publications

(268 citation statements)

References 48 publications

Supporting

Mentioning

228

Contrasting

Unclassified

Order By: Relevance

“…For instance, dense vector representations called word embeddings 86 are able to capture semantic relationships between words, such as sex, gender and ethnic relationships 87 , thus absorbing biases existing in the training corpus 88 . Methods for bias mitigation in NLP have been recently reviewed, including learning gender-neutral embeddings and tagging the data points to preserve the gender of the source 89 .…”

Section: Natural Language Processingmentioning

confidence: 99%

Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare

et al. 2020

View full text Add to dashboard Cite

Precision Medicine implies a deep understanding of inter-individual differences in health and disease that are due to genetic and environmental factors. To acquire such understanding there is a need for the implementation of different types of technologies based on artificial intelligence (AI) that enable the identification of biomedically relevant patterns, facilitating progress towards individually tailored preventative and therapeutic interventions. Despite the significant scientific advances achieved so far, most of the currently used biomedical AI technologies do not account for bias detection. Furthermore, the design of the majority of algorithms ignore the sex and gender dimension and its contribution to health and disease differences among individuals. Failure in accounting for these differences will generate sub-optimal results and produce mistakes as well as discriminatory outcomes. In this review we examine the current sex and gender gaps in a subset of biomedical technologies used in relation to Precision Medicine. In addition, we provide recommendations to optimize their utilization to improve the global health and disease landscape and decrease inequalities.

show abstract

Section: Natural Language Processingmentioning

confidence: 99%

Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Alternate models also exist that build embeddings from medical databases and the scientific literature, however for this paper we focus on the use of Word2Vec and GloVe, as opposed to the narrower datasets described in more detail in the paper by Kalyan et al [52]. As described by Pennington et al GloVe embeddings were trained on text copora from Wikipedia data, Gigaword and web data from Common Crawl which built a vocabulary of 400,000 frequent words [57]. Word2Vec was trained on the Google News dataset (containined~100billion words) which resulted in a model of 300-dimensional vectors for 3 million words and phrases [58].…”

Section: Plos Onementioning

confidence: 99%

Artificial Intelligence in mental health and the biases of language based models

Straw

Callison-Burch

2020

PLoS ONE

View full text Add to dashboard Cite

Background The rapid integration of Artificial Intelligence (AI) into the healthcare field has occurred with little communication between computer scientists and doctors. The impact of AI on health outcomes and inequalities calls for health professionals and data scientists to make a collaborative effort to ensure historic health disparities are not encoded into the future. We present a study that evaluates bias in existing Natural Language Processing (NLP) models used in psychiatry and discuss how these biases may widen health inequalities. Our approach systematically evaluates each stage of model development to explore how biases arise from a clinical, data science and linguistic perspective. Design/Methods A literature review of the uses of NLP in mental health was carried out across multiple disciplinary databases with defined Mesh terms and keywords. Our primary analysis evaluated biases within ‘GloVe’ and ‘Word2Vec’ word embeddings. Euclidean distances were measured to assess relationships between psychiatric terms and demographic labels, and vector similarity functions were used to solve analogy questions relating to mental health. Results Our primary analysis of mental health terminology in GloVe and Word2Vec embeddings demonstrated significant biases with respect to religion, race, gender, nationality, sexuality and age. Our literature review returned 52 papers, of which none addressed all the areas of possible bias that we identify in model development. In addition, only one article existed on more than one research database, demonstrating the isolation of research within disciplinary silos and inhibiting cross-disciplinary collaboration or communication. Conclusion Our findings are relevant to professionals who wish to minimize the health inequalities that may arise as a result of AI and data-driven algorithms. We offer primary research identifying biases within these technologies and provide recommendations for avoiding these harms in the future.

show abstract

“…Gender classification from text is a fundamental task in author profiling, and in particular author profiling on social media has recently received a lot of attention from the NLP community (Bamman et al, 2014;Sap et al, 2014;Ciot et al, 2013). Additionally, gender is often in the spotlight of research of fairness and bias in NLP (Sun et al, 2019). Biases are often introduced by demographic and other imbalances in training data.…”

Section: Gender Classification Biasmentioning

confidence: 99%

PANDORA Talks: Personality and Demographics on Reddit

Gjurković

Karan²,

Vukojević³

et al. 2020

Preprint

View full text Add to dashboard Cite

Personality and demographics are important variables in social sciences, whilein NLP they can aid in interpretability and removal of societal biases.However, datasets with both personality and demographic labels are scarce. Toaddress this, we present PANDORA, the first large-scale dataset of Reddit commentslabeled with three personality models (including the well-established Big 5 model) and demographics (age, gender, and location) for more than 10k users. Weshowcase the usefulness of this dataset on three experiments, where we leveragethe more readily available data from other personality models to predict theBig 5 traits, analyze gender classification biases arising frompsycho-demographic variables, and carry out a confirmatory and exploratoryanalysis based on psychological theories. Finally, we present benchmarkprediction models for all personality and demographic variables.

show abstract

Mitigating Gender Bias in Natural Language Processing: Literature Review

Cited by 317 publications

References 48 publications

Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare

Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare

Artificial Intelligence in mental health and the biases of language based models

PANDORA Talks: Personality and Demographics on Reddit

Contact Info

Product

Resources

About