The SARS-CoV-2 spike protein mediates target recognition, cellular entry, and ultimately the viral infection that leads to various levels of COVID-19 severities. Positive evolutionary selection of mutations within the spike protein has led to the genesis of new SARS-CoV-2 variants with greatly enhanced overall fitness. Given the trend of variants with increased fitness arising from spike protein alterations, it is critical that the scientific community understand the mechanisms by which these mutations alter viral functions. As of March 2022, five SARS-CoV-2 strains were labeled “variants of concern” by the World Health Organization: the Alpha, Beta, Gamma, Delta, and Omicron variants. This review summarizes the potential mechanisms by which the common mutations on the spike protein that occur within these strains enhance the overall fitness of their respective variants. In addressing these mutations within the context of the SARS-CoV-2 spike protein structure, spike/receptor binding interface, spike/antibody binding, and virus neutralization, we summarize the general paradigms that can be used to estimate the effects of future mutations along SARS-CoV-2 evolution.
Coronaviruses are enveloped non-segmented positive-sense RNA viruses. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused an ongoing pandemic infecting 219 million people as of October 19, 2021, with a 3.6% mortality rate. Although coronaviruses have RNA proofreading functions, a large number of variants still exist as quasispecies. Natural selection can generate favorable mutations with improved fitness advantages, including pathogenicity, infectivity, transmissibility, angiotensin-converting enzyme 2 (ACE2) binding affinity, and antigenicity. However, the identified coronaviruses might just be the tip of the iceberg, and potentially more fatal variants of concern (VOCs) may emerge over time. Understanding the patterns of emerging VOCs and forecasting mutations that may potentially lead to gain of function or immune escape is urgently required. Here we developed PhyloTransformer, which is a Transformer-based discriminative model that engages a multi-head self-attention mechanism to model genetic mutations that may lead to viral reproductive advantage. In order to identify complex dependencies between the elements of each input sequence, PhyloTransformer utilizes advanced modeling techniques, including a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+) from Performer, and the Masked Language Model (MLM) from Bidirectional Encoder Representations from Transformers (BERT). PhyloTransformer was trained with 1,765,297 genetic sequences retrieved from the Global Initiative for Sharing All Influenza Data (GISAID) database. Firstly, we compared the prediction accuracy of novel mutations and novel combinations using extensive baseline models, including a Transformer-based local model, called Local Transformer, and other local models, such as ResNet-18, multilayer perceptron, logistic regression, KNN, random forest, and gradient boosting; we found that PhyloTransformer outperformed every baseline method with statistical significance. Secondly, we examined predictions of mutations in each nucleotide of the receptor binding motif (RBM), which is a specific sequence of amino acids from the SARS-CoV-2 spike protein that mediates the binding of spike protein to ACE2. Our predictions displayed preciseness and accuracy: our model predicted a total of two mutations in the RBM, and these two mutations precisely coincided with two of the four important mutations presented in seminal bench studies. Thirdly, we predicted modifications of N-glycosylation sites to help identify mutations associated with altered glycosylation that might be favored during viral evolution. We anticipate that the viral mutations predicted by PhyloTransformer may shed light on potential new mutations that may lead to fitness advantages of SARS-CoV-2 variants. Thus, our predicted variants may guide therapeutics and vaccine design for effective targeting of future SARS-CoV-2 variants.
Although coronaviruses have RNA proofreading functions, a large number of variants still exist as quasispecies. Identified coronaviruses might just be the tip of the iceberg, and potentially more fatal variants of concern (VOCs) may emerge over time. These VOCs may exhibit increased pathogenicity, infectivity, transmissibility, angiotensin-converting enzyme 2 (ACE2) binding affinity, and antigenicity, causing an increased threat to public health. In this article, we developed PhyloTransformer, a Transformer-based self-supervised discriminative model, which can model genetic mutations that may lead to viral reproductive advantage. We trained PhyloTransformer on 1,765,297 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences to infer fitness advantages, by directly modeling the amino acid sequence mutations. PhyloTransformer utilizes advanced techniques from natural language processing, including the Fast Attention Via positive Orthogonal Random features approach (FAVOR+) and the Masked Language Model (MLM), which enable efficient and accurate intra-sequence dependency modeling over the entire RNA sequence. We measured the prediction accuracy of novel mutations and novel combinations using our method and baseline models that only take local segments as input. We found that PhyloTransformer outperformed every baseline method with statistical significance. In order to identify mutations associated with altered glycosylation that might be favored during viral evolution, we predicted the occurrence of mutations in each nucleotide of the receptor binding motif (RBM) and predicted modifications of N-glycosylation sites. We anticipate that the viral mutations predicted by PhyloTransformer may identify potential mutations of threat to guide therapeutics and vaccine design for effective targeting of future SARS-CoV-2 variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.