2022
DOI: 10.1128/msystems.00035-22
|View full text |Cite
|
Sign up to set email alerts
|

Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences

Abstract: Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 98 publications
0
9
0
Order By: Relevance
“…The relative success of a mixed effect modeling approach suggests that refining the modeling of group level random effects or otherwise incorporate hidden variables are necessary to account for structural issues in the data. Moreover, having established a proof of concept in this study using logistic regression and boosted decision trees, future work can explore the potential application of deep learning methods, which have proven to be highly useful to genetic sequence to function modeling in other contexts [49] , [145] , [146] , [147] .…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The relative success of a mixed effect modeling approach suggests that refining the modeling of group level random effects or otherwise incorporate hidden variables are necessary to account for structural issues in the data. Moreover, having established a proof of concept in this study using logistic regression and boosted decision trees, future work can explore the potential application of deep learning methods, which have proven to be highly useful to genetic sequence to function modeling in other contexts [49] , [145] , [146] , [147] .…”
Section: Discussionmentioning
confidence: 99%
“…The challenges of country-level variation are heightened by substantial regional imbalances in the GISAID patient data set. The entire GISAID database is fundamentally biased towards Europe, North America, and select countries in Asia and elsewhere, with over half the sample originating in either the United Kingdom or United States as of January 2022 [49] . Within the subset of data with patient status metadata, the biases are similarly idiosyncratic; for example, over 40% of the training and test samples shown in this paper were obtained from France.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In particular, as COVID-19 shifts from a pandemic to an endemic state, emerging genetic variants of the virus may have different health burdens and require different public health responses [ 2 ]. As a result, there is a critical need for modeling methods capable of predicting the risk of severe disease burden on hospitals and vulnerable populations as a result of continued viral evolution [ 3 ].…”
Section: Introductionmentioning
confidence: 99%
“…Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative virus of coronavirus disease 2019 (COVID- 19), emerged at the end of 2019, burdening both the global economy and public health [1][2][3][4]. Next-generation sequencing has provided an unprecedented opportunity to monitor the COVID-19 pandemic in real-time [5,6]. During the pandemic, vast amounts of SARS-CoV-2 genome sequences have been accumulated at ever-growing rates and shared in the public database.…”
Section: Introductionmentioning
confidence: 99%