BioJava 5: A community driven open-source bioinformatics library

Lafita, Aleix; Bliven, Spencer; Prlić, Andreas; Guzenko, Dmytro; Rose, Peter W.; Bradley, A.R.; Pavan, Paolo; Myers-Turnbull, Douglas; Valasatava, Yana; Heuer, Michael; Larson, Matt; Burley, S.K.; Duarte, José M.

doi:10.1371/journal.pcbi.1006791

Cited by 43 publications

(40 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…b NR stands for Non-Redundant at a given percentage (number) of sequence identity. To carry out the sequence identity comparisons, we used a local alignment algorithm (Smith-Waterman 30 ) implemented in BioJava 31 , with the BLOSUM62 substitution matrix. www.nature.com/scientificreports/ now on, the Spearman correlation-based filter with a threshold equal to 0.8 will be the one used during the first stage of the feature selection process to compute a candidate set.…”

Section: Resultsmentioning

confidence: 99%

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach

Aguilera‐Mendoza

Marrero-Ponce

García-Jacas

et al. 2020

Sci Rep

View full text Add to dashboard Cite

The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the “ocean” of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool (http://mobiosd-hub.com/starpep/), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.

show abstract

Section: Resultsmentioning

confidence: 99%

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach

Aguilera‐Mendoza

Marrero-Ponce

García-Jacas

et al. 2020

Sci Rep

View full text Add to dashboard Cite

show abstract

“…A further advantage of our method is that it automatically solves the chain matching problem [21]. These advantages combined with speed, yield a system that compares favorably to quaternary structure search and alignment tools in terms of scalability [14,42,20,19]. Moreover, this method does not rely on atomic models and thus can be applied directly to the growing number of experimental maps obtained using 3D electron microscopy (3DEM) and available from the EMDB data resource (www.ebi.ac.uk/pdbe/emdb/).…”

Section: Discussionmentioning

confidence: 99%

“…Traditional comparison methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangement(s) within an oligomeric assembly. While solutions that address these problems exist [14,19,20,21], they are computationally expensive and will not necessarily scale with continued growth of the PDB.…”

Section: Introductionmentioning

confidence: 99%

Real time structural search of the Protein Data Bank

2020

Self Cite

View full text Add to dashboard Cite

Detection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density volumes directly. But, brute force alignment of 3D data is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density volumes and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http:// shape.rcsb.org).

show abstract

“…A further advantage of our method is that it automatically solves the chain matching problem [21]. These advantages combined with speed, yield a system that compares favorably to quaternary structure search and alignment tools in terms of scalability [14,42,20,19]. Moreover, this method can be applied directly to the growing number of electric Coulomb potential maps obtained using 3D electron microscopy (3DEM) and available from the EMDB data resource (https://www.ebi.ac.uk/pdbe/emdb/).…”

Section: Discussionmentioning

confidence: 99%

Real time structural search of the Protein Data Bank

Guzenko

Burley

Duarte

2019

Preprint

Self Cite

View full text Add to dashboard Cite

Detection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density maps directly. But, brute force alignment of electron density distributions is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density maps and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http://shape.rcsb.org).

show abstract

BioJava 5: A community driven open-source bioinformatics library

Cited by 43 publications

References 24 publications

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach

Real time structural search of the Protein Data Bank

Real time structural search of the Protein Data Bank

Contact Info

Product

Resources

About