2021
DOI: 10.48550/arxiv.2108.03465
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A k-mer Based Approach for SARS-CoV-2 Variant Identification

Abstract: With the rapid spread of the novel coronavirus (COVID-19) across the globe and its continuous mutation, it is of pivotal importance to design a system to identify different known (and unknown) variants of SARS-CoV-2. Identifying particular variants helps to understand and model their spread patterns, design effective mitigation strategies, and prevent future outbreaks. It also plays a crucial role in studying the efficacy of known vaccines against each variant and modeling the likelihood of breakthrough infect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…Sequence classification and clustering are popular approaches [8,25], most methods requiring the sequences to be aligned [17]. Such aligned sequences are used to design fixed-length feature vector representations that can be input to classical machine learning (ML) based algorithms.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Sequence classification and clustering are popular approaches [8,25], most methods requiring the sequences to be aligned [17]. Such aligned sequences are used to design fixed-length feature vector representations that can be input to classical machine learning (ML) based algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…A popular approach to preserve the order of the sequences in converting the alphabets vector into numeric vector is to use the one-hot encoding approach [27]. In theory, while the one-hot encoding preserves the order of amino acid sequences, the preserved order is of no use while computing pairwise distance (e.g., euclidean distance) [8]. To deal with this problem, each sequence is first converted into length π‘˜ substrings (called π‘˜-mers).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The AI-based method has also identified another selection mechanism, vaccine breakthrough (or antibody resistance) [ 6 ]. In addition, by focusing on the amino acid sequence of the S protein, various k-mer based AI approaches (e.g., Support Vector Machine, Naive Bayes and Random Forest) have been reported [ 7 ], and the UMAP-assisted K-means clustering with the SNP profiles of SARS-CoV-2 has also been reported [ 8 ]. Unsupervised machine learning can provide new information without relying on particular models or assumptions.…”
Section: Introductionmentioning
confidence: 99%