Algorithm Fairness in AI for Medicine and Healthcare

Chen, Richard J.; Chen, Tiffany; Lipková, Jana; Wang, Judy J.; Williamson, Drew F. K.; Lu, Ming; Sahai, Sharifa; Mahmood, Faisal

doi:10.48550/arxiv.2110.00603

Cited by 6 publications

(5 citation statements)

References 162 publications

(199 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We measure the performance gap in diagnosis AUC between the advantaged and disadvantaged subgroups as an indicator of group fairness. This is in line with the "separability" criteria (Chen et al, 2021;Dwork et al, 2012) that algorithm scores should be conditionally independent of the sensitive attribute given the diagnostic label (i.e., Ŷ ⊥ S|Y ), which is also adopted by (Gardner et al, 2019;Fong et al, 2021). On the other hand, Zietlow et al (2022) find that for high-capacity models in computer vision, this is typically achieved by worsening the performance of the advantaged group rather than improving the disadvantaged group, a phenomenon termed as leveling down in philosophy that has incurred numerous criticisms (Christiano and Braynen, 2008;Brown, 2003;Doran, 2001).…”

Section: Fairness Definition In Medicinesupporting

confidence: 84%

MEDFAIR: Benchmarking Fairness for Medical Imaging

Zong¹,

Yang²,

Hospedales³

2022

Preprint

View full text Add to dashboard Cite

A multitude of work has shown that machine learning-based medical diagnosis systems can be biased against certain subgroups of people. This has motivated a growing number of bias mitigation algorithms that aim to address fairness issues in machine learning. However, it is difficult to compare their effectiveness in medical imaging for two reasons. First, there is little consensus on the criteria to assess fairness. Second, existing bias mitigation algorithms are developed under different settings, e.g., datasets, model selection strategies, backbones, and fairness metrics, making a direct comparison and evaluation based on existing results impossible. In this work, we introduce MEDFAIR, a framework to benchmark the fairness of machine learning models for medical imaging. MEDFAIR covers eleven algorithms from various categories, nine datasets from different imaging modalities, and three model selection criteria. Through extensive experiments, we find that the under-studied issue of model selection criterion can have a significant impact on fairness outcomes; while in contrast, state-ofthe-art bias mitigation algorithms do not significantly improve fairness outcomes over empirical risk minimization (ERM) in both in-distribution and out-of-distribution settings. We evaluate fairness from various perspectives and make recommendations for different medical application scenarios that require different ethical principles. Our framework provides a reproducible and easy-to-use entry point for the development and evaluation of future bias mitigation algorithms in deep learning. Code is available at https://github.com/ys-zong/MEDFAIR.

show abstract

Section: Fairness Definition In Medicinesupporting

confidence: 84%

MEDFAIR: Benchmarking Fairness for Medical Imaging

Zong¹,

Yang²,

Hospedales³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Apart from interpretability, few studies addressed fairness and robustness issues. In precision medicine, we aim for a fair system that provides personalized and equitable treatment to each individual without any bias [47]. Chen et al [47] gave the example of biased systems in healthcare.…”

Section: Causality In Healthcare Through Scm Frameworkmentioning

confidence: 99%

“…In precision medicine, we aim for a fair system that provides personalized and equitable treatment to each individual without any bias [47]. Chen et al [47] gave the example of biased systems in healthcare. An algorithm trained only on USA cancer pathology data may lead to wrong classification, when deployed on data from Turkish cancer patients, due to protocol variations or population shifts (imbalanced data).…”

Section: Causality In Healthcare Through Scm Frameworkmentioning

confidence: 99%

A Review of the Role of Causality in Developing Trustworthy AI Systems

Ganguly¹,

Fazlija²,

Badar³

et al. 2023

Preprint

View full text Add to dashboard Cite

State-of-the-art AI models largely lack an understanding of the cause-effect relationship that governs human understanding of the real world. Consequently, these models do not generalize to unseen data, often produce unfair results, and are difficult to interpret. This has led to efforts to improve the trustworthiness aspects of AI models. Recently, causal modeling and inference methods have emerged as powerful tools. This review aims to provide the reader with an overview of causal methods that have been developed to improve the trustworthiness of AI models. We hope that our contribution will motivate future research on causality-based solutions for trustworthy AI.

show abstract

“…Artificial intelligence (AI), machine learning (ML), and data-driven technologies are expected to deliver novel ways of understanding and improving mental healthcare. 1 In healthcare applications of AI/ML generally, there has been increased focus on the potential for unintended harm arising from biases present in data 2 and resulting from model assumptions. Two striking examples being racial biases in an algorithm deployed to identify increased healthcare needs 3 and commonly used models for estimating renal function (employing standard biostatistical methods) have been shown to be poorly calibrated for estimating kidney disease in people of colour.…”

Section: Introductionmentioning

confidence: 99%

Defining acceptable data collection and reuse standards for queer artificial intelligence research in mental health: protocol for the online PARQAIR-MH Delphi study

Joyce,

Kormilitzin,

Hamer-Hunt

et al. 2024

BMJ Open

View full text Add to dashboard Cite

IntroductionFor artificial intelligence (AI) to help improve mental healthcare, the design of data-driven technologies needs to be fair, safe, and inclusive. Participatory design can play a critical role in empowering marginalised communities to take an active role in constructing research agendas and outputs. Given the unmet needs of the LGBTQI+ (Lesbian, Gay, Bisexual, Transgender, Queer and Intersex) community in mental healthcare, there is a pressing need for participatory research to include a range of diverse queer perspectives on issues of data collection and use (in routine clinical care as well as for research) as well as AI design. Here we propose a protocol for a Delphi consensus process for the development of PARticipatory Queer AI Research for Mental Health (PARQAIR-MH) practices, aimed at informing digital health practices and policy.Methods and analysisThe development of PARQAIR-MH is comprised of four stages. In stage 1, a review of recent literature and fact-finding consultation with stakeholder organisations will be conducted to define a terms-of-reference for stage 2, the Delphi process. Our Delphi process consists of three rounds, where the first two rounds will iterate and identify items to be included in the final Delphi survey for consensus ratings. Stage 3 consists of consensus meetings to review and aggregate the Delphi survey responses, leading to stage 4 where we will produce a reusable toolkit to facilitate participatory development of future bespoke LGBTQI+–adapted data collection, harmonisation, and use for data-driven AI applications specifically in mental healthcare settings.Ethics and disseminationPARQAIR-MH aims to deliver a toolkit that will help to ensure that the specific needs of LGBTQI+ communities are accounted for in mental health applications of data-driven technologies. The study is expected to run from June 2024 through January 2025, with the final outputs delivered in mid-2025. Participants in the Delphi process will be recruited by snowball and opportunistic sampling via professional networks and social media (but not by direct approach to healthcare service users, patients, specific clinical services, or via clinicians’ caseloads). Participants will not be required to share personal narratives and experiences of healthcare or treatment for any condition. Before agreeing to participate, people will be given information about the issues considered to be in-scope for the Delphi (eg, developing best practices and methods for collecting and harmonising sensitive characteristics data; developing guidelines for data use/reuse) alongside specific risks of unintended harm from participating that can be reasonably anticipated. Outputs will be made available in open-access peer-reviewed publications, blogs, social media, and on a dedicated project website for future reuse.

show abstract

Algorithm Fairness in AI for Medicine and Healthcare

Cited by 6 publications

References 162 publications

MEDFAIR: Benchmarking Fairness for Medical Imaging

MEDFAIR: Benchmarking Fairness for Medical Imaging

A Review of the Role of Causality in Developing Trustworthy AI Systems

Defining acceptable data collection and reuse standards for queer artificial intelligence research in mental health: protocol for the online PARQAIR-MH Delphi study

Contact Info

Product

Resources

About