Objective To analyse genome variants of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Methods Between 1 February and 1 May 2020, we downloaded 10 022 SARS CoV-2 genomes from four databases. The genomes were from infected patients in 68 countries. We identified variants by extracting pairwise alignment to the reference genome NC_045512, using the EMBOSS needle. Nucleotide variants in the coding regions were converted to corresponding encoded amino acid residues. For clade analysis, we used the open source software Bayesian evolutionary analysis by sampling trees, version 2.5. Findings We identified 5775 distinct genome variants, including 2969 missense mutations, 1965 synonymous mutations, 484 mutations in the non-coding regions, 142 non-coding deletions, 100 in-frame deletions, 66 non-coding insertions, 36 stop-gained variants, 11 frameshift deletions and two in-frame insertions. The most common variants were the synonymous 3037C > T (6334 samples), P4715L in the open reading frame 1ab (6319 samples) and D614G in the spike protein (6294 samples). We identified six major clades, (that is, basal, D614G, L84S, L3606F, D448del and G392D) and 14 subclades. Regarding the base changes, the C > T mutation was the most common with 1670 distinct variants. Conclusion We found that several variants of the SARS-CoV-2 genome exist and that the D614G clade has become the most common variant since December 2019. The evolutionary analysis indicated structured transmission, with the possibility of multiple introductions into the population.
During cancer therapy, tumor heterogeneity can drive the evolution of multiple tumor subclones harboring unique resistance mechanisms in an individual patient 1-3. Prior case reports and small Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:
Malicious programs, such as viruses and worms, are frequently related to previous programs through evolutionary relationships. Discovering those relationships and constructing a phylogeny model is expected to be helpful for analyzing new malware and for establishing a principled naming scheme. Matching permutations of code may help build better models in cases where malware evolution does not keep things in the same order. We describe methods for constructing phylogeny models that uses features called n-perms to match possibly permuted codes. An experiment was performed to compare the relative effectiveness of vector similarity measures using n-perms and n-grams when comparing permuted variants of programs. The similarity measures using n-perms maintained a greater separation between the similarity scores of permuted families of specimens versus unrelated specimens. A subsequent study using a tree generated through n-perms suggests that phylogeny models based on n-perms may help forensic analysts investigate new specimens, and assist in reconciling malware naming inconsistencies.
Circulating tumor cells (CTC) are shed by cancer into the bloodstream, where a viable subset overcomes oxidative stress to initiate metastasis. We show that single CTCs from patients with melanoma coordinately upregulate lipogenesis and iron homeostasis pathways. These are correlated with both intrinsic and acquired resistance to BRAF inhibitors across clonal cultures of BRAF-mutant CTCs. The lipogenesis regulator SREBP2 directly induces transcription of the iron carrier Transferrin (TF), reducing intracellular iron pools, reactive oxygen species, and lipid peroxidation, thereby conferring resistance to inducers of ferroptosis. Knockdown of endogenous TF impairs tumor formation by melanoma CTCs, and their tumorigenic defects are partially rescued by the lipophilic antioxidants ferrostatin-1 and vitamin E. In a prospective melanoma cohort, presence of CTCs with high lipogenic and iron metabolic RNA signatures is correlated with adverse clinical outcome, irrespective of treatment regimen. Thus, SREBP2-driven iron homeostatic pathways contribute to cancer progression, drug resistance, and metastasis. Significance: Through single-cell analysis of primary and cultured melanoma CTCs, we have uncovered intrinsic cancer cell heterogeneity within lipogenic and iron homeostatic pathways that modulates resistance to BRAF inhibitors and to ferroptosis inducers. Activation of these pathways within CTCs is correlated with adverse clinical outcome, pointing to therapeutic opportunities. This article is highlighted in the In This Issue feature, p. 521
Next‐generation sequencing (NGS) has emerged as an affordable and reproducible means to query tumors for somatic genetic anomalies. To help interpret somatic NGS data, many institutions have created a molecular tumor board to analyze the results of NGS and make recommendations. This article evaluates the utility of cognitive computing systems to analyze data for clinical decision‐making.
Parallel genomic alterations of antigen and payload targets mediate polyclonal acquired clinical resistance to sacituzumab govitecan in triple-negative breast cancer.
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).The advent of DNA sequencing technology is generating vast amounts of sequences that are deposited in public databases. The rate at which genomes can be sequenced has now outpaced the rate at which a sequence's function can be determined through wet-lab experimentation, thus leading to increasing demand for automated (in silico) approaches to the elucidation of protein function. As more and more protein sequences and complete genomes become available in the public domain, in silico protein annotation is emerging as an inexpensive and effective approach for dealing with the flood of genomic data.Of the numerous approaches that have been proposed over the years, the determination of regions of similarity between a novel protein of unknown function and one or more database proteins with known annotation has been the method of choice. Such a determination allows one to predict the common region in the protein of unknown function as exhibiting the functional characteristics of the respective region from the annotated database protein through what is frequently called a "guilty-by-association" approach. These methods are also known as homology-based methods, and they have led to significant advances in protein annotation (2,22,36).During the latter half of the 1990s, pattern-based approaches have been steadily gaining ground as the methods of choice for solving various computational problems in molecular biology (28). One such algorithm is MUSCA, a multiple sequence alignment algorithm, which we described in...
Using TEIRESIAS, a pattern discovery method that identifies all motifs present in any given set of protein sequences without requiring alignment or explicit enumeration of the solution space, we have explored the GenPept sequence database and built a dictionary of all sequence patterns with two or more instances. The entries of this dictionary, henceforth named seqlets, cover 98.12% of all amino acid positions in the input database and in essence provide a comprehensive finite set of descriptors for protein sequence space. As such, seqlets can be effectively used to describe almost every naturally occurring protein. In fact, seqlets can be thought of as building blocks of protein molecules that are a necessary (but not sufficient) condition for function or family equivalence memberships. Thus, seqlets can either define conserved family signatures or cut across molecular families and previously undetected sequence signals deriving from functional convergence. Moreover, we show that seqlets also can capture structurally conserved motifs. The availability of a dictionary of seqlets that has been derived in such an unsupervised, hierarchical manner is generating new opportunities for addressing problems that range from reliable classification and the correlation of sequence fragments with functional categories to faster and sensitive engines for homology searches, evolutionary studies, and protein structure prediction. Proteins 1999;37:264-277.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.