2021 IEEE International Conference on Big Data (Big Data) 2021
DOI: 10.1109/bigdata52589.2021.9671848
|View full text |Cite
|
Sign up to set email alerts
|

Spike2Vec: An Efficient and Scalable Embedding Approach for COVID-19 Spike Sequences

Abstract: With the rapid global spread of COVID-19, more and more data related to this virus is becoming available, including genomic sequence data. The total number of genomic sequences that are publicly available on platforms such as GISAID is currently several million, and is increasing with every day. The availability of such Big Data creates a new opportunity for researchers to study this virus in detail. This is particularly important with all of the dynamics of the COVID-19 variants which emerge and circulate. Th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 51 publications
(44 citation statements)
references
References 54 publications
0
44
0
Order By: Relevance
“…In this section, we present our results for PWM2Vec and compare its performance with the baseline one-hot embedding (OHE) and the more recent k-mer-based embedding approach, which has shown to be an improvement over OHE [33,34]. For classification, we also show the results for the feature selection method (ridge regression) for all embedding approaches.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…In this section, we present our results for PWM2Vec and compare its performance with the baseline one-hot embedding (OHE) and the more recent k-mer-based embedding approach, which has shown to be an improvement over OHE [33,34]. For classification, we also show the results for the feature selection method (ridge regression) for all embedding approaches.…”
Section: Resultsmentioning
confidence: 99%
“…This section proposes an approach, PWM2Vec, to generate a fixed-length numerical feature embedding from coronavirus spike sequences for host specification. We also discuss the baseline approaches, specifically one-hot embedding (OHE) [32,34] and k-mer-based feature embedding [33,34]. We perform feature selection using ridge regression [70] on the resulting embedding before applying machine learning (ML) algorithms.…”
Section: Proposed Approachmentioning
confidence: 99%
See 2 more Smart Citations
“…Fast and efficient solutions to the clade assignment problem would help in tracking current and evolving strains and it is crucial for the surveillance of the pathogen. This classification problem has been attacked with machine learning approaches [3,4,5] using the Spike protein amino acid sequence to drive the classification step.…”
Section: Introductionmentioning
confidence: 99%