Uniclust databases of clustered and deeply annotated protein sequences and alignments

Mirdita, Milot; Driesch, Lars von den; Galiez, Clovis; Martı́n, Marı́a Jesús; Söding, Johannes; Steinegger, Martin

doi:10.1093/nar/gkw1081

Cited by 618 publications

(552 citation statements)

References 24 publications

(24 reference statements)

Supporting

Mentioning

508

Contrasting

Order By: Relevance

“…Protein domains were identified using a representative set of RNA virus genomes, including representative members of ICTV-approved virus families and unclassified virus groups. This set was annotated manually using sensitive profile-profile comparisons with the HHsuite package (127), and hmm profiles for annotated proteins or their domains were generated by running one iteration of HHblits against the latest (October 2017) uniclust30 database (131). Each annotated profile was assigned to a functional category (e.g., Љcapsid protein_jelly-roll,Љ Љchymotrypsinlike proteaseЉ).…”

Section: Methodsmentioning

confidence: 99%

Origins and Evolution of the Global RNA Virome

Wolf

Kazlauskas

Iranzo

et al. 2018

mBio

427

333

View full text Add to dashboard Cite

Viruses with RNA genomes dominate the eukaryotic virome, reaching enormous diversity in animals and plants. The recent advances of metaviromics prompted us to perform a detailed phylogenomic reconstruction of the evolution of the dramatically expanded global RNA virome. The only universal gene among RNA viruses is the gene encoding the RNA-dependent RNA polymerase (RdRp). We developed an iterative computational procedure that alternates the RdRp phylogenetic tree construction with refinement of the underlying multiple-sequence alignments. The resulting tree encompasses 4,617 RNA virus RdRps and consists of 5 major branches; 2 of the branches include positive-sense RNA viruses, 1 is a mix of positive-sense (ϩ) RNA and double-stranded RNA (dsRNA) viruses, and 2 consist of dsRNA and negative-sense (Ϫ) RNA viruses, respectively. This tree topology implies that dsRNA viruses evolved from ϩRNA viruses on at least two independent occasions, whereas ϪRNA viruses evolved from dsRNA viruses. Reconstruction of RNA virus evolution using the RdRp tree as the scaffold suggests that the last common ancestors of the major branches of ϩRNA viruses encoded only the RdRp and a single jelly-roll capsid protein. Subsequent evolution involved independent capture of additional genes, in particular, those encoding distinct RNA helicases, enabling replication of larger RNA genomes and facilitating virus genome expression and virus-host interactions. Phylogenomic analysis reveals extensive gene module exchange among diverse viruses and horizontal virus transfer between distantly related hosts. Although the network of evolutionary relationships within the RNA virome is bound to further expand, the present results call for a thorough reevaluation of the RNA virus taxonomy. IMPORTANCE The majority of the diverse viruses infecting eukaryotes have RNA genomes, including numerous human, animal, and plant pathogens. Recent advances of metagenomics have led to the discovery of many new groups of RNA viruses in a wide range of hosts. These findings enable a far more complete reconstruction of the evolution of RNA viruses than was attainable previously. This reconstruction reveals the relationships between different Baltimore classes of viruses and indicates extensive transfer of viruses between distantly related hosts, such as plants and animals. These results call for a major revision of the existing taxonomy of RNA viruses.

show abstract

Section: Methodsmentioning

confidence: 99%

Origins and Evolution of the Global RNA Virome

Wolf

Kazlauskas

Iranzo

et al. 2018

mBio

427

333

View full text Add to dashboard Cite

show abstract

“…We searched for homologs with HHsearch against the PDB70 database, which has a maximum mutual sequence identity of 70% between proteins deposited in PDB, released on May 23, 2018. Sequence profiles were generated with HHblits by searching homologous sequences with the default options against Uniclust30, which is a clustered UniProtKB database at the level of 30% pairwise sequence identity, released in September 2016. We predicted bound ligands by considering the structural similarity of detected homologs.…”

Section: Methodsmentioning

confidence: 99%

Driven to near‐experimental accuracy by refinement via molecular dynamics simulations

2019

View full text Add to dashboard Cite

Protein model refinement has been an essential part of successful protein structure prediction. Molecular dynamics simulation‐based refinement methods have shown consistent improvement of protein models. There had been progress in the extent of refinement for a few years since the idea of ensemble averaging of sampled conformations emerged. There was little progress in CASP12 because conformational sampling was not sufficiently diverse due to harmonic restraints. During CASP13, a new refinement method was tested that achieved significant improvements over CASP12. The new method intended to address previous bottlenecks in the refinement problem by introducing new features. Flat‐bottom harmonic restraints replaced harmonic restraints, sampling was performed iteratively, and a new scoring function and selection criteria were used. The new protocol expanded conformational sampling at reduced computational costs. In addition to overall improvements, some models were refined significantly to near‐experimental accuracy.

show abstract

“…Gene models were predicted by BRAKER1 v1.11 (Hoff, Lange, Lomsadze, Borodovsky, & Stanke, 2016) using the soft-masked genome assembly and the STAR alignment file as inputs. Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al, 2017) using MMseqs2 with an e value < 1e −05 . Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al, 2017) using MMseqs2 with an e value < 1e −05 .…”

Section: Gene Model Prediction and Annotationmentioning

confidence: 99%

“…Gene models were annotated by querying models against the Uniclust90 database (Mirdita et al, 2017) using MMseqs2 with an e value < 1e −05 . Gene Ontology (GO) terms associated with the representative UniProtKB sequence for each Uniclust90 hit were attributed to the A. tenebrosa gene model using the idmapping_selected.tab file provided by UniProtKB.…”

Section: Gene Model Prediction and Annotationmentioning

confidence: 99%

The draft genome of Actinia tenebrosa reveals insights into toxin evolution

Surm

Stewart

Papanicolaou

et al. 2019

Ecology and Evolution

View full text Add to dashboard Cite

Sea anemones have a wide array of toxic compounds (peptide toxins found in their venom) which have potential uses as therapeutics. To date, the majority of studies characterizing toxins in sea anemones have been restricted to species from the superfamily, Actinioidea. No highly complete draft genomes are currently available for this superfamily, however, highlighting our limited understanding of the genes encoding toxins in this important group. Here we have sequenced, assembled, and annotated a draft genome for Actinia tenebrosa. The genome is estimated to be approximately 255 megabases, with 31,556 protein‐coding genes. Quality metrics revealed that this draft genome matches the quality and completeness of other model cnidarian genomes, including Nematostella, Hydra, and Acropora. Phylogenomic analyses revealed strong conservation of the Cnidaria and Hexacorallia core‐gene set. However, we found that lineage‐specific gene families have undergone significant expansion events compared with shared gene families. Enrichment analysis performed for both gene ontologies, and protein domains revealed that genes encoding toxins contribute to a significant proportion of the lineage‐specific genes and gene families. The results make clear that the draft genome of A. tenebrosa will provide insight into the evolution of toxins and lineage‐specific genes, and provide an important resource for the discovery of novel biological compounds.

show abstract

Uniclust databases of clustered and deeply annotated protein sequences and alignments

Cited by 618 publications

References 24 publications

Origins and Evolution of the Global RNA Virome

Origins and Evolution of the Global RNA Virome

Driven to near‐experimental accuracy by refinement via molecular dynamics simulations

The draft genome of Actinia tenebrosa reveals insights into toxin evolution

Contact Info

Product

Resources

About