2020
DOI: 10.1093/jamia/ocaa258
|View full text |Cite
|
Sign up to set email alerts
|

Potential limitations in COVID-19 machine learning due to data source variability: A case study in the nCov2019 dataset

Abstract: Objective Lack of representative COVID-19 data is a bottleneck for reliable and generalizable machine learning. Data sharing is insufficient without data quality, where source variability plays an important role. We showcase and discuss potential biases from data source variability for COVID-19 machine learning. Materials and Methods We used the publicly available nCov2019 dataset, including patient level data from several co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
44
0
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 64 publications
(46 citation statements)
references
References 22 publications
0
44
0
1
Order By: Relevance
“…There were virtually no measures to prevent the virus from spreading worldwide even when the hazard was well known to everyone, and there was an inability for detecting cases in time to prevent further contagion. Even at the time of counting the fatalities, the data collection systems have shown major discrepancies [ 3 ], evidencing a total lack of basic data management strategies. Moreover, the very fact that the pandemic was unexpected, despite warnings by some studies [ 4 ], shows the unpreparedness of public administrations and society in general.…”
Section: Introductionmentioning
confidence: 99%
“…There were virtually no measures to prevent the virus from spreading worldwide even when the hazard was well known to everyone, and there was an inability for detecting cases in time to prevent further contagion. Even at the time of counting the fatalities, the data collection systems have shown major discrepancies [ 3 ], evidencing a total lack of basic data management strategies. Moreover, the very fact that the pandemic was unexpected, despite warnings by some studies [ 4 ], shows the unpreparedness of public administrations and society in general.…”
Section: Introductionmentioning
confidence: 99%
“…The Cox model allows the analysis of the simultaneous effect of a set of covariables in the survival expressed by the hazard ratio. The Cox model implemented in lifelines [43] also allows to obtain a prediction of survival by calculating the survival expected time.…”
Section: Discussionmentioning
confidence: 99%
“…The main limitation in our study resides in the use of data from only one hospital, an internal validation only assures the performance of the models with similar data. We cannot ensure the reported efficiency in other hospitals and/or with other patient populations [43]. Also, data from the same centres can change over time due a wide variety of reasons such as change in protocols or external agents such a pandemic [44, 45].…”
Section: Discussionmentioning
confidence: 99%
“…We firstly selected the group of clusters that showed relatively better Silhouette Coefficient values, then chose the number of clusters from these which provided the most reasonable and clinically distinguishable classification regarding clinical phenotypes and demographic features. This process was supported by the COVID-19 pipelines and exploratory tool we developed in previous work 21 .…”
Section: Methodsmentioning
confidence: 99%