Motivation: Expert curation to differentiate between functionally diverged homologs and those that may still share a similar function routinely relies on the visual interpretation of domain architecture changes. However, the size of contemporary data sets integrating homologs from hundreds to thousands of species calls for alternate solutions. Scoring schemes to evaluate domain architecture similarities can help to automatize this procedure, in principle. But existing schemes are often too simplistic in the similarity assessment, many require an a-priori resolution of overlapping domain annotations, and those that allow overlaps to extend the set of annotations sources cannot account for redundant annotations. As a consequence, the gap between the automated similarity scoring and the similarity assessment based on visual architecture comparison is still too wide to make the integration of both approaches meaningful. Results: Here, we present FAS, a scoring system for the comparison of multi-layered feature architectures integrating information from a broad spectrum of annotation sources. Feature architectures are represented as directed acyclic graphs, and redundancies are resolved in the course of comparison using a score maximization algorithm. A benchmark using more than 10,000 human-yeast ortholog pairs reveals that FAS consistently outperforms existing scoring schemes. Using three examples, we show how automated architecture similarity assessments can be routinely applied in the benchmarking of orthology assignment software, in the identification of functionally diverged orthologs, and in the identification of entries in protein collections that most likely stem from a faulty gene prediction. Availability and implementation: FAS is available as python package: https://pypi.org/project/greedyFAS/ .
Motivation Protein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations. Results Here, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximise the pair-wise architecture similarity. In a large-scale evaluation on more than 10,000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications. Availability FAS is available as python package: https://pypi.org/project/greedyFAS/ Supplementary information Supplementary data are available at Bioinformatics online.
Covid-19 is the most devastating pandemic of the past 100 years. A zoonotic transfer presumably at a wildlife market introduced the causative virus, SARS-CoV-2 (sarbecovirus; beta-coronavirus), to humans in late 2019. Meanwhile, the mechanistic details of the infection process have been largely elucidated, and structural models explain binding of the virial spike to the human cell surface receptor ACE2. Yet, the evolutionary trajectory that gave rise to this pathogen is poorly understood. Here we scan SARS-CoV-2 protein sequences in-silico for innovations along the evolutionary lineage starting with the last common ancestor of coronaviruses. Substantial differences in the sets of proteins encoded by SARS-CoV-2 and viruses outside sarbecovirus, and in their domain architectures, indicate divergent functional demands. By contrast, sarbecoviruses themselves are almost fully conserved at these levels of resolution. However, profiling spike evolution on the sub-domain level using predicted linear epitopes reveals that this protein was gradually reshaped within sarbecovirus. The only epitope that is private to SARS-CoV-2 overlaps with the furin cleavage site. This lends phylogenetic support to the hypothesis that a change in strategy facilitated the zoonotic transfer of SARS-CoV-2 and its success as a human pathogen. Upon furin cleavage, spike switches from a “stealth mode” where immunodominant ACE2 binding epitopes are largely hidden to an “attack mode” where these epitopes are exposed. The resulting reinforcement of ACE2 binding extends the window of opportunity for cell entry. SARS-CoV-2 variants fine-tuning this mode switch will be particularly threatening as they optimize immune evasion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.