The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has challenged the speed at which laboratories discover the viral composition and study health outcomes. The small ~30-kb ssRNA genome of coronaviruses makes them adept at cross-species spread, but also enable a robust understanding of all the proteins the viral genome encodes. We have employed protein modeling, molecular dynamic simulations, evolutionary mapping, and 3D printing to gain a full proteome- and dynamicome-level understanding of SARS-CoV-2. We established the Viral Integrated Structural Evolution Dynamic Database (VIStEDD at prokoplab.com/vistedd) to facilitate future discoveries and educational use. Here, we highlight the use of VIStEDD for nsp6, nucleocapsid (N), and spike (S) surface glycoprotein. For both nsp6 and N, we found highly conserved surface amino acids that likely drive protein–protein interactions. In characterizing viral S protein, we developed a quantitative dynamics cross-correlation matrix to gain insights into its interactions with the angiotensin I–converting enzyme 2 (ACE2)–solute carrier family 6 member 19 (SLC6A19) dimer. Using this quantitative matrix, we elucidated 47 potential functional missense variants from genomic databases within ACE2/SLC6A19/transmembrane serine protease 2 (TMPRSS2), warranting genomic enrichment analyses in SARS-CoV-2 patients. These variants had ultralow frequency but existed in males hemizygous for ACE2. Two ACE2 noncoding variants (rs4646118 and rs143185769) present in ~9% of individuals of African descent may regulate ACE2 expression and may be associated with increased susceptibility of African Americans to SARS-CoV-2. We propose that this SARS-CoV-2 database may aid research into the ongoing pandemic.
The SARS-CoV-2 pandemic, starting in 2019, has challenged the speed at which labs perform science, ranging from discoveries of the viral composition to handling health outcomes in humans. The small ~30kb single-stranded RNA genome of Coronaviruses makes them adept at cross species spread and drift, increasing their probability to cause pandemics. However, this small genome also allows for a robust understanding of all proteins coded by the virus. We employed protein modeling, molecular dynamic simulations, evolutionary mapping, and 3D printing to gain a full proteome and dynamicome understanding of SARS-CoV-2. The Viral Integrated Structural Evolution Dynamic Database (VIStEDD) has been established (prokoplab.com/vistedd), opening future discoveries and educational usage. In this paper, we highlight VIStEDD usage for nsp6, Nucleocapsid (N), and Spike (S) surface glycoprotein. For both nsp6 and N we reveal highly conserved surface amino acids that likely drive protein-protein interactions. In characterizing viral S protein, we have developed a quantitative dynamics cross correlation matrix insight into interaction with the ACE2/SLC6A19 dimer complex. From this quantitative matrix, we elucidated 47 potential functional missense variants from population genomic databases within ACE2/SLC6A19/TMPRSS2, warranting genomic enrichment analyses in SARS-CoV-2 patients. Moreover, these variants have ultralow frequency, but can exist as hemizygous in males for ACE2, which falls on the X-chromosome. Two noncoding variants (rs4646118 and rs143185769) found in ~9% of African descent individuals for ACE2 may regulate expression and be related to increased susceptibility of African Americans to SARS-CoV-2. This powerful database of SARS-CoV-2 can aid in research progress in the ongoing pandemic.
The SOX transcription factor family is pivotal in controlling aspects of development. To identify genotype–phenotype relationships of SOX proteins, we performed a non-biased study of SOX using 1890 open-reading frame and 6667 amino acid sequences in combination with structural dynamics to interpret 3999 gnomAD, 485 ClinVar, 1174 Geno2MP, and 4313 COSMIC human variants. We identified, within the HMG(High Mobility Group)- box, twenty-seven amino acids with changes in multiple SOX proteins annotated to clinical pathologies. These sites were screened through Geno2MP medical phenotypes, revealing novel SOX15 R104G associated with musculature abnormality and SOX8 R159G with intellectual disability. Within gnomAD, SOX18 E137K (rs201931544), found within the HMG box of ~0.8% of Latinx individuals, is associated with seizures and neurological complications, potentially through blood–brain barrier alterations. A total of 56 highly conserved variants were found at sites outside the HMG-box, including several within the SOX2 HMG-box-flanking region with neurological associations, several in the SOX9 dimerization region associated with Campomelic Dysplasia, SOX14 K88R (rs199932938) flanking the HMG box
Insulin is amongst the human genome’s most well-studied genes/proteins due to its connection to metabolic health. Within this article, we review literature and data to build a knowledge base of Insulin (INS) genetics that influence transcription, transcript processing, translation, hormone maturation, secretion, receptor binding, and metabolism while highlighting the future needs of insulin research. The INS gene region has 2076 unique variants from population genetics. Several variants are found near the transcriptional start site, enhancers, and following the INS transcripts that might influence the readthrough fusion transcript INS–IGF2. This INS–IGF2 transcript splice site was confirmed within hundreds of pancreatic RNAseq samples, lacks drift based on human genome sequencing, and has possible elevated expression due to viral regulation within the liver. Moreover, a rare, poorly characterized African population-enriched variant of INS–IGF2 results in a loss of the stop codon. INS transcript UTR variants rs689 and rs3842753, associated with type 1 diabetes, are found in many pancreatic RNAseq datasets with an elevation of the 3′UTR alternatively spliced INS transcript. Finally, by combining literature, evolutionary profiling, and structural biology, we map rare missense variants that influence preproinsulin translation, proinsulin processing, dimer/hexamer secretory storage, receptor activation, and C-peptide detection for quasi-insulin blood measurements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.