2021
DOI: 10.48550/arxiv.2111.11665
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR

Abstract: Despite the routine use of electronic health record (EHR) data by radiologists to contextualize clinical history and inform image interpretation, the majority of deep learning architectures for medical imaging are unimodal, i.e., they only learn features from pixel-level information. Recent research revealing how race can be recovered from pixel data alone highlights the potential for serious biases in models which fail to account for demographics and other key patient attributes. Yet the lack of imaging datas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(16 citation statements)
references
References 47 publications
0
16
0
Order By: Relevance
“…Indeed, the first issue one encounters is that a large number of candidate measures exist. One can for instance evaluate fairness by comparing standard ML performance metrics across different sub-groups, such as accuracy 10,[12][13][14][15][16] , or AUC ROC (the area under the receiver operating characteristic curve) [8][9][10][14][15][16][17][18][19][20][21][22] , among others. Alternatively, one can choose to employ one of the (no less than ten) different fairnessspecific criteria formulated by the community 23 in order to audit the presence of bias in a given model 16,18 .…”
Section: What Does It Mean For An Algorithm To Be Fair?mentioning
confidence: 99%
See 1 more Smart Citation
“…Indeed, the first issue one encounters is that a large number of candidate measures exist. One can for instance evaluate fairness by comparing standard ML performance metrics across different sub-groups, such as accuracy 10,[12][13][14][15][16] , or AUC ROC (the area under the receiver operating characteristic curve) [8][9][10][14][15][16][17][18][19][20][21][22] , among others. Alternatively, one can choose to employ one of the (no less than ten) different fairnessspecific criteria formulated by the community 23 in order to audit the presence of bias in a given model 16,18 .…”
Section: What Does It Mean For An Algorithm To Be Fair?mentioning
confidence: 99%
“…One can for instance evaluate fairness by comparing standard ML performance metrics across different sub-groups, such as accuracy 10,[12][13][14][15][16] , or AUC ROC (the area under the receiver operating characteristic curve) [8][9][10][14][15][16][17][18][19][20][21][22] , among others. Alternatively, one can choose to employ one of the (no less than ten) different fairnessspecific criteria formulated by the community 23 in order to audit the presence of bias in a given model 16,18 . To complicate matters further, even if one carries out a multi-dimensional study by simultaneously employing multiple metrics 9,10,[14][15][16]20,21,24 , which model to select at the end in a given setting might be no trivial matter and additional information will in general be required.…”
Section: What Does It Mean For An Algorithm To Be Fair?mentioning
confidence: 99%
“…The multiple perspectives of medical data from different modalities provide information on patient treatment, allowing multimodal models to gradually show their unique advantages in the healthcare field [54]. The research on the fairness issue of multimodal models is not many at current stage [99,17]. A preliminary work presents a multimodal benchmark dataset consisting of 1794 patients and their corresponding EHR data and high-resolution computed tomography (CT) data, called RadFusion [99].…”
Section: Other Data Typesmentioning
confidence: 99%
“…The research on the fairness issue of multimodal models is not many at current stage [99,17]. A preliminary work presents a multimodal benchmark dataset consisting of 1794 patients and their corresponding EHR data and high-resolution computed tomography (CT) data, called RadFusion [99]. The authors evaluate the performance of several representative multimodal fusion models on a diagnostic task of pulmonary embolism and benchmark their fairness in a protected subgroup.…”
Section: Other Data Typesmentioning
confidence: 99%
“…As machine learning models become increasingly integrated in the healthcare setting, one primary concern is whether such models are being used in a fair and ethical way (Ahmad et al, 2020;Wawira Gichoya et al, 2021;. In the field of machine learning for medical imaging, there have been several prior works that benchmark the degree of disparities between protected groups for machine learning mod- (Kinyanjui et al, 2020) and CT scans (Zhou et al, 2021), though to our knowledge, our work is the first to benchmark algorithms for bias reduction in the medical imaging setting.…”
Section: Fairness In Computational Medical Imagingmentioning
confidence: 99%