Silvio Tosatto scite author profile

The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.

show abstract

DOME: recommendations for supervised machine learning validation in biology

Walsh

et al. 2021

View full text Add to dashboard Cite

Modern biology frequently relies on machine learning to provide predictions and improve decision processes. There have been recent calls for more scrutiny on machine learning performance and possible limitations. Here we present a set of community-wide recommendations aiming to help establish standards of machine learning validation in biology. Adopting a structured methods description for machine learning based on DOME (data, optimization, model, evaluation) will allow both reviewers and readers to better understand and assess the performance and limitations of a method or outcome. The recommendations are complemented by a machine learning summary table which can be easily included in the supplementary material of published papers. Redundancy between data splitsMaximum pairwise identity within and between training and test set is 25% enforced with UniqueProt tool. Availability of dataYes, URL: http://protein.bio.unipd.it/espritz/ Optimization Algorithm BRNN (Bi-directional recurrent neural network) with ensemble averaging. Meta-predictionsNo. Data encodingSliding window of length 23 residues on input sequence with "one hot" encoding (i.e. 20 inputs per residue).

show abstract

A divide and conquer approach to fast loop modeling

Tosatto¹,

Bindewald²,

Hesser³

et al. 2002

View full text Add to dashboard Cite

We describe a fast ab initio method for modeling local segments in protein structures. The algorithm is based on a divide and conquer approach and uses a database of precalculated look-up tables, which represent a large set of possible conformations for loop segments of variable length. The target loop is recursively decomposed until the resulting conformations are small enough to be compiled analytically. The algorithm, which is not restricted to any specific loop length, generates a ranked set of loop conformations in 20-180 s on a desktop PC. The prediction quality is evaluated in terms of global RMSD. Depending on loop length the top prediction varies between 1.06 A RMSD for three-residue loops and 3.72 A RMSD for eight-residue loops. Due to its speed the method may also be useful to generate alternative starting conformations for complex simulations.

show abstract

Simple consensus procedures are effective and sufficient in secondary structure prediction

Albrecht¹,

Tosatto²,

Lengauer³

et al. 2003

Protein Engineering Design and Selection

View full text Add to dashboard Cite

We have analyzed the performance of majority voting on minimal combination sets of three state-of-the-art secondary structure prediction methods in order to obtain a consensus prediction. Using three large benchmark sets from the EVA server, our results show a significant improvement in the average Q3 prediction accuracy of up to 1.5 percentage points by consensus formation. The application of an additional trivial filtering procedure for predicted secondary structure elements that are too short, does not significantly affect the prediction accuracy. Our analysis also provides valuable insight into the similarity of the results of the prediction methods that we combine as well as the higher confidence in consistently predicted secondary structure.

show abstract

PDBe-KB: collaboratively defining the biological context of structural data

Váradi¹,

Anyango²,

Armstrong³

et al. 2021

View full text Add to dashboard Cite

The Protein Data Bank in Europe – Knowledge Base (PDBe-KB, https://pdbe-kb.org) is an open collaboration between world-leading specialist data resources contributing functional and biophysical annotations derived from or relevant to the Protein Data Bank (PDB). The goal of PDBe-KB is to place macromolecular structure data in their biological context by developing standardised data exchange formats and integrating functional annotations from the contributing partner resources into a knowledge graph that can provide valuable biological insights. Since we described PDBe-KB in 2019, there have been significant improvements in the variety of available annotation data sets and user functionality. Here, we provide an overview of the consortium, highlighting the addition of annotations such as predicted covalent binders, phosphorylation sites, effects of mutations on the protein structure and energetic local frustration. In addition, we describe a library of reusable web-based visualisation components and introduce new features such as a bulk download data service and a novel superposition service that generates clusters of superposed protein chains weekly for the whole PDB archive.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Silvio Tosatto

DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation

DOME: recommendations for supervised machine learning validation in biology

A divide and conquer approach to fast loop modeling

Simple consensus procedures are effective and sufficient in secondary structure prediction

PDBe-KB: collaboratively defining the biological context of structural data

Contact Info

Product

Resources

About