2020
DOI: 10.1093/jamia/ocaa164
|View full text |Cite
|
Sign up to set email alerts
|

Reporting of demographic data and representativeness in machine learning models using electronic health records

Abstract: Objective The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic health record (EHR) data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
30
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(35 citation statements)
references
References 31 publications
0
30
0
Order By: Relevance
“…Past research in other medical fields have revealed that machine learning models that are parameterized and trained on populations of patients with different characteristics than the target population they are to be used on can lead to biased predictions [13][14][15][16][17] . We sought to assess whether this was the case for our KF models as well.…”
Section: J O U R N a L P R E -P R O O Fmentioning
confidence: 99%
“…Past research in other medical fields have revealed that machine learning models that are parameterized and trained on populations of patients with different characteristics than the target population they are to be used on can lead to biased predictions [13][14][15][16][17] . We sought to assess whether this was the case for our KF models as well.…”
Section: J O U R N a L P R E -P R O O Fmentioning
confidence: 99%
“…Inconsistencies in how ML models from electronic health records have also been reported, with details regarding race and ethnicity of participants omitted in 64% of studies, and only 12% of models being externally validated. 11 …”
Section: Introductionmentioning
confidence: 99%
“…Inconsistencies in how ML models from electronic health records have also been reported, with details regarding race and ethnicity of participants omitted in 64% of studies, and only 12% of models being externally validated. 11 In order to address these concerns, adapted research reporting guidelines based on the well-established EQUATOR Network (Enhancing the QUAlity and Transparency Of health Research) 12 13 and de novo recommendations by individual societies have been published, with a greater relevance for AI research. In this review, we highlight those that will cover the majority of healthcare focused AI-related studies, and explain how they differ to the well-known guidance for non-AI related clinical work.…”
Section: Introductionmentioning
confidence: 99%
“…One review examining 164 models described in the scientific literature found low reporting rates of demographic variables such as race (36%) and socioeconomic status (8%) as well as low external validation rates (12%). 43 A critical review of published models for diagnosis and prognosis of COVID-19 found that most models were at high risk of bias due to poor reporting. 44 The purpose of this analysis is to assess whether the documentation available for commonly deployed models provides the information requested by model reporting guidelines.…”
Section: Introductionmentioning
confidence: 99%
“…44 The purpose of this analysis is to assess whether the documentation available for commonly deployed models provides the information requested by model reporting guidelines. Compared to previous work, 43,44 we focus on user-facing product documentation accompanying models. Thus, we are able to analyze models that have been deployed in practice but not yet described in peerreviewed publications.…”
Section: Introductionmentioning
confidence: 99%