Abstract:Computational tools for the analysis of protein data and the prediction of biological properties are essential in life sciences and biomedical research. Here, we introduce ProtDCal‐Suite, a web server comprising a set of machine learning‐based methods for studying proteins. The main module of ProtDCal‐Suite is the ProtDCal software. ProtDCal translates the structural information of proteins into numerical descriptors that serve as input to machine‐learning techniques. The ProtDCal‐Suite server also incorporate… Show more
“…Unlike the legacy descriptors, new descriptors (those in Table SI1-2) are implemented by www.nature.com/scientificreports/ applying statistical and aggregation operators on amino acid property vectors, e.g., measures of central tendency, statistical dispersion, OWA operators 54,55 , and fuzzy Choquet integral operators 56,57 . Further, it should be noted that the reasons for using these operators in the calculation of MDs have been demonstrated elsewhere [58][59][60][61][62] . Let D = [x ij ] n×m be a descriptor matrix whose rows and columns represent peptide instances and calculated features, respectively, i.e., x ij encodes the numerical value for the jth descriptor of the ith peptide sequence.…”
The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the “ocean” of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool (http://mobiosd-hub.com/starpep/), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.
“…Unlike the legacy descriptors, new descriptors (those in Table SI1-2) are implemented by www.nature.com/scientificreports/ applying statistical and aggregation operators on amino acid property vectors, e.g., measures of central tendency, statistical dispersion, OWA operators 54,55 , and fuzzy Choquet integral operators 56,57 . Further, it should be noted that the reasons for using these operators in the calculation of MDs have been demonstrated elsewhere [58][59][60][61][62] . Let D = [x ij ] n×m be a descriptor matrix whose rows and columns represent peptide instances and calculated features, respectively, i.e., x ij encodes the numerical value for the jth descriptor of the ith peptide sequence.…”
The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the “ocean” of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool (http://mobiosd-hub.com/starpep/), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.
“…To localize the region in B2M responsible for its antibacterial activity, the amino acid sequence of B2M was subjected to a massive virtual screening of all possible fragments with a sequence length between 10 and 30 residues (2078 peptides in total). Our in-house machine-learning-based predictor, ABP-Finder ( https://protdcal.zmb.uni-due.de/ABP-Finder/index.php ) [ 28 ], was used to first identify putative antibacterial peptides (ABP), and to predict whether the bacterial targets for each of these ABP belong to the classes Gram-positive, Gram-negative, or to both types of the Gram staining assay. ABP-Finder was used to score the 2078 peptides derived from B2M.…”
The respiratory tract is a major entry site for microbial pathogens. To combat bacterial infections, the immune system has various defense mechanisms at its disposal, including antimicrobial peptides (AMPs). To search for novel AMPs from the respiratory tract, a peptide library from human broncho-alveolar-lavage (BAL) fluid was screened for antimicrobial activity by radial diffusion assays allowing the efficient detection of antibacterial activity within a small sample size. After repeated testing-cycles and subsequent purification, we identified ß-2-microglobulin (B2M) in antibacterially active fractions. B2M belongs to the MHC-1 receptor complex present at the surface of nucleated cells. It is known to inhibit the growth of
Listeria monocytogenes
and
Escherichia coli
and to facilitate phagocytosis of
Staphylococcus aureus
. Using commercially available B2M we confirmed a dose-dependent inhibition of
Pseudomonas aeruginosa
and
L. monocytogenes
. To characterize AMP activity within the B2M sequence, peptide fragments of the molecule were tested for antimicrobial activity. Activity could be localized to the C-terminal part of B2M. Investigating pH dependency of the antimicrobial activity of B2M demonstrated an increased activity at pH values of 5.5 and below, a hallmark of infection and inflammation. Sytox green uptake into bacterial cells following the exposure to B2M was determined and revealed a pH-dependent loss of bacterial membrane integrity. TEM analysis showed areas of disrupted bacterial membranes in
L. monocytogenes
incubated with B2M and high amounts of lysed bacterial cells. In conclusion, B2M as part of a ubiquitous cell surface complex may represent a potent antimicrobial agent by interfering with bacterial membrane integrity.
“…For every one of the proteins in these two sets, the corresponding three-dimensional structure was obtained from the Protein Data Bank [41]. The RCCs for each protein were calculated as previously described [21], but we varied the distance criterion from 4 to 15 Å (4,5,6,7,8,9,10,11,12,13,14, and 15 Å) and either included or did not include the atoms of the sidechains. Then, the resulting RCCs for every pair of proteins (positive and negative sets of PPI) were added or concatenated to produce a single numeric representation for every protein pair, with 26 or 52 features (RCC1, RCC2, RCC3, .…”
Predicting protein–protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm–parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96–99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.