Abstract:We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owi… Show more
“…Protein domains were identified using a representative set of RNA virus genomes, including representative members of ICTV-approved virus families and unclassified virus groups. This set was annotated manually using sensitive profile-profile comparisons with the HHsuite package (127), and hmm profiles for annotated proteins or their domains were generated by running one iteration of HHblits against the latest (October 2017) uniclust30 database (131). Each annotated profile was assigned to a functional category (e.g., Љcapsid protein_jelly-roll,Љ Љchymotrypsinlike proteaseЉ).…”
Viruses with RNA genomes dominate the eukaryotic virome, reaching enormous diversity in animals and plants. The recent advances of metaviromics prompted us to perform a detailed phylogenomic reconstruction of the evolution of the dramatically expanded global RNA virome. The only universal gene among RNA viruses is the gene encoding the RNA-dependent RNA polymerase (RdRp). We developed an iterative computational procedure that alternates the RdRp phylogenetic tree construction with refinement of the underlying multiple-sequence alignments. The resulting tree encompasses 4,617 RNA virus RdRps and consists of 5 major branches; 2 of the branches include positive-sense RNA viruses, 1 is a mix of positive-sense (ϩ) RNA and double-stranded RNA (dsRNA) viruses, and 2 consist of dsRNA and negative-sense (Ϫ) RNA viruses, respectively. This tree topology implies that dsRNA viruses evolved from ϩRNA viruses on at least two independent occasions, whereas ϪRNA viruses evolved from dsRNA viruses. Reconstruction of RNA virus evolution using the RdRp tree as the scaffold suggests that the last common ancestors of the major branches of ϩRNA viruses encoded only the RdRp and a single jelly-roll capsid protein. Subsequent evolution involved independent capture of additional genes, in particular, those encoding distinct RNA helicases, enabling replication of larger RNA genomes and facilitating virus genome expression and virus-host interactions. Phylogenomic analysis reveals extensive gene module exchange among diverse viruses and horizontal virus transfer between distantly related hosts. Although the network of evolutionary relationships within the RNA virome is bound to further expand, the present results call for a thorough reevaluation of the RNA virus taxonomy. IMPORTANCE The majority of the diverse viruses infecting eukaryotes have RNA genomes, including numerous human, animal, and plant pathogens. Recent advances of metagenomics have led to the discovery of many new groups of RNA viruses in a wide range of hosts. These findings enable a far more complete reconstruction of the evolution of RNA viruses than was attainable previously. This reconstruction reveals the relationships between different Baltimore classes of viruses and indicates extensive transfer of viruses between distantly related hosts, such as plants and animals. These results call for a major revision of the existing taxonomy of RNA viruses.
“…Protein domains were identified using a representative set of RNA virus genomes, including representative members of ICTV-approved virus families and unclassified virus groups. This set was annotated manually using sensitive profile-profile comparisons with the HHsuite package (127), and hmm profiles for annotated proteins or their domains were generated by running one iteration of HHblits against the latest (October 2017) uniclust30 database (131). Each annotated profile was assigned to a functional category (e.g., Љcapsid protein_jelly-roll,Љ Љchymotrypsinlike proteaseЉ).…”
Viruses with RNA genomes dominate the eukaryotic virome, reaching enormous diversity in animals and plants. The recent advances of metaviromics prompted us to perform a detailed phylogenomic reconstruction of the evolution of the dramatically expanded global RNA virome. The only universal gene among RNA viruses is the gene encoding the RNA-dependent RNA polymerase (RdRp). We developed an iterative computational procedure that alternates the RdRp phylogenetic tree construction with refinement of the underlying multiple-sequence alignments. The resulting tree encompasses 4,617 RNA virus RdRps and consists of 5 major branches; 2 of the branches include positive-sense RNA viruses, 1 is a mix of positive-sense (ϩ) RNA and double-stranded RNA (dsRNA) viruses, and 2 consist of dsRNA and negative-sense (Ϫ) RNA viruses, respectively. This tree topology implies that dsRNA viruses evolved from ϩRNA viruses on at least two independent occasions, whereas ϪRNA viruses evolved from dsRNA viruses. Reconstruction of RNA virus evolution using the RdRp tree as the scaffold suggests that the last common ancestors of the major branches of ϩRNA viruses encoded only the RdRp and a single jelly-roll capsid protein. Subsequent evolution involved independent capture of additional genes, in particular, those encoding distinct RNA helicases, enabling replication of larger RNA genomes and facilitating virus genome expression and virus-host interactions. Phylogenomic analysis reveals extensive gene module exchange among diverse viruses and horizontal virus transfer between distantly related hosts. Although the network of evolutionary relationships within the RNA virome is bound to further expand, the present results call for a thorough reevaluation of the RNA virus taxonomy. IMPORTANCE The majority of the diverse viruses infecting eukaryotes have RNA genomes, including numerous human, animal, and plant pathogens. Recent advances of metagenomics have led to the discovery of many new groups of RNA viruses in a wide range of hosts. These findings enable a far more complete reconstruction of the evolution of RNA viruses than was attainable previously. This reconstruction reveals the relationships between different Baltimore classes of viruses and indicates extensive transfer of viruses between distantly related hosts, such as plants and animals. These results call for a major revision of the existing taxonomy of RNA viruses.
“…We searched for homologs with HHsearch against the PDB70 database, which has a maximum mutual sequence identity of 70% between proteins deposited in PDB, released on May 23, 2018. Sequence profiles were generated with HHblits by searching homologous sequences with the default options against Uniclust30, which is a clustered UniProtKB database at the level of 30% pairwise sequence identity, released in September 2016. We predicted bound ligands by considering the structural similarity of detected homologs.…”
Protein model refinement has been an essential part of successful protein structure prediction. Molecular dynamics simulation‐based refinement methods have shown consistent improvement of protein models. There had been progress in the extent of refinement for a few years since the idea of ensemble averaging of sampled conformations emerged. There was little progress in CASP12 because conformational sampling was not sufficiently diverse due to harmonic restraints. During CASP13, a new refinement method was tested that achieved significant improvements over CASP12. The new method intended to address previous bottlenecks in the refinement problem by introducing new features. Flat‐bottom harmonic restraints replaced harmonic restraints, sampling was performed iteratively, and a new scoring function and selection criteria were used. The new protocol expanded conformational sampling at reduced computational costs. In addition to overall improvements, some models were refined significantly to near‐experimental accuracy.
“…Gene models were predicted by BRAKER1 v1.11 (Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2016) using the soft-masked genome assembly and the STAR alignment file as inputs. Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al, 2017) using MMseqs2 with an e value < 1e −05 . Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al, 2017) using MMseqs2 with an e value < 1e −05 .…”
Section: Gene Model Prediction and Annotationmentioning
confidence: 99%
“…Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al, 2017) using MMseqs2 with an e value < 1e −05 . Gene Ontology (GO) terms associated with the representative UniProtKB sequence for each Uniclust90 hit were attributed to the A. tenebrosa gene model using the idmapping_selected.tab file provided by UniProtKB.…”
Section: Gene Model Prediction and Annotationmentioning
Sea anemones have a wide array of toxic compounds (peptide toxins found in their venom) which have potential uses as therapeutics. To date, the majority of studies characterizing toxins in sea anemones have been restricted to species from the superfamily, Actinioidea. No highly complete draft genomes are currently available for this superfamily, however, highlighting our limited understanding of the genes encoding toxins in this important group. Here we have sequenced, assembled, and annotated a draft genome for Actinia tenebrosa. The genome is estimated to be approximately 255 megabases, with 31,556 protein‐coding genes. Quality metrics revealed that this draft genome matches the quality and completeness of other model cnidarian genomes, including Nematostella, Hydra, and Acropora. Phylogenomic analyses revealed strong conservation of the Cnidaria and Hexacorallia core‐gene set. However, we found that lineage‐specific gene families have undergone significant expansion events compared with shared gene families. Enrichment analysis performed for both gene ontologies, and protein domains revealed that genes encoding toxins contribute to a significant proportion of the lineage‐specific genes and gene families. The results make clear that the draft genome of A. tenebrosa will provide insight into the evolution of toxins and lineage‐specific genes, and provide an important resource for the discovery of novel biological compounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.