2023
DOI: 10.1093/jamia/ocad077
|View full text |Cite|
|
Sign up to set email alerts
|

De-black-boxing health AI: demonstrating reproducible machine learning computable phenotypes using the N3C-RECOVER Long COVID model in the All of Us data repository

Abstract: Machine learning (ML)-driven computable phenotypes are among the most challenging to share and reproduce. Despite this difficulty, the urgent public health considerations around Long COVID make it especially important to ensure the rigor and reproducibility of Long COVID phenotyping algorithms such that they can be made available to a broad audience of researchers. As part of the NIH Researching COVID to Enhance Recovery (RECOVER) Initiative, researchers with the National COVID Cohort Collaborative (N3C) devis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 17 publications
(14 reference statements)
0
1
0
Order By: Relevance
“…Non-symptom-related, continuous variables such as healthcare utilization rates are binned to avoid overfitting and overreliance on patterns unique to the N3C population. As was true of the prior version, 12 we believe this new model version will be translatable to other sites and consortia with OMOP implementations to promote reuse and reproducibility.…”
Section: Model Explainabilitymentioning
confidence: 99%
See 1 more Smart Citation
“…Non-symptom-related, continuous variables such as healthcare utilization rates are binned to avoid overfitting and overreliance on patterns unique to the N3C population. As was true of the prior version, 12 we believe this new model version will be translatable to other sites and consortia with OMOP implementations to promote reuse and reproducibility.…”
Section: Model Explainabilitymentioning
confidence: 99%
“…11 The ML model's purpose is to use information from EHR data to predict missing Long COVID labels, thus serving as a computable phenotype for Long COVID. The model was performant and generalizable, 12 but heavily relies on the existence and timing of an index date for a patient's acute COVID-19 infection. Moreover, the model only considers a patient's first SARS-CoV-2 infection, as at the time we did not anticipate the now-common occurrence of SARS-CoV-2 reinfections.…”
mentioning
confidence: 99%
“…The group developed an advanced machine learning (ML) based phenotype for long COVID and tested it using the NIH’s All of Us (AoU) data. 3 , 4 While an important step in identifying long COVID patients, the model relies entirely on patients’ EHR data, which is only representative of a patient’s recorded history in a medical setting. This potentially limits the algorithm’s ability to detect cases.…”
Section: Introductionmentioning
confidence: 99%