Characterization of Graphs for Protein Structure Modeling and Recognition of Solubility

Journal of Biomolecular Structure and Dynamics

Maiorino

Giuliani

et al. 2016

Self Cite

In this paper, we present a generative model for protein contact networks (PCNs). The soundness of the proposed model is investigated by focusing primarily on mesoscopic properties elaborated from the spectra of the graph Laplacian. To complement the analysis, we also study the classical topological descriptors, such as statistics of the shortest paths and the important feature of modularity. Our experiments show that the proposed model results in a considerable improvement with respect to two suitably chosen generative mechanisms, mimicking with better approximation real PCNs in terms of diffusion properties elaborated from the normalized Laplacian spectra. However, as well as the other network models, it does not reproduce with sufficient accuracy the shortest paths structure. To compensate this drawback, we designed a second step involving a targeted edge reconfiguration process. The ensemble of reconfigured networks denotes further improvements that are statistically significant. As an important byproduct of our study, we demonstrate that modularity, a well-known property of proteins, does not entirely explain the actual network architecture characterizing PCNs. In fact, we conclude that modularity, intended as a quantification of an underlying community structure, should be considered as an emergent property of the structural organization of proteins. Interestingly, such a property is suitably optimized in PCNs together with the feature of path efficiency.

Section: Datasetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A generative model for protein contact networks

Journal of Biomolecular Structure and Dynamics

Maiorino

Giuliani

et al. 2016

Self Cite

“…The biological networks analysed in this work are partially linked to those analysed in our previous works [30,31]. We consider 400 E. coli protein contact networks (PCN) as the main object of study and we compare them to several models.…”

Section: The Considered Datamentioning

confidence: 99%

Multifractal characterization of protein contact networks

Maiorino

Physica A: Statistical Mechanics and its Applications

Giuliani

et al. 2015

Self Cite

The multifractal detrended fluctuation analysis of time series is able to reveal the presence of longrange correlations and, at the same time, to characterize the self-similarity of the series. The rich information derivable from the characteristic exponents and the multifractal spectrum can be further analyzed to discover important insights about the underlying dynamical process. In this paper, we employ multifractal analysis techniques in the study of protein contact networks. To this end, initially a network is mapped to three different time series, each of which is generated by a stationary unbiased random walk. To capture the peculiarities of the networks at different levels, we accordingly consider three observables at each vertex: the degree, the clustering coefficient, and the closeness centrality. To compare the results with suitable references, we consider also instances of three well-known network models and two typical time series with pure monofractal and multifractal properties. The first result of notable interest is that time series associated to proteins contact networks exhibit long-range correlations (strong persistence), which are consistent with signals in-between the typical monofractal and multifractal behavior. Successively, a suitable embedding of the multifractal spectra allows to focus on ensemble properties, which in turn gives us the possibility to make further observations regarding the considered networks. In particular, we highlight the different role that small and large fluctuations of the considered observables play in the characterization of the network topology.

“…The analysis of large volumes of data is hampered by many technical problems, including the ones related to the quality and interpretation of associated information. One-class classifier design is an important research endeavour [1], [2] that can be used to tackle problems of anomaly/novelty detection or, more generally, to recognize outliers in incoming data [3]- [8]. Several different methods have been proposed in the literature, including clustering-based techniques, kernel methods, and statistical approaches (see [9] for a recent survey).…”

Section: Introductionmentioning

confidence: 99%

“…We show experimental results on both synthetic and realworld datasets for one-class classification, containing samples represented as feature vectors and labeled graphs. In this paper, in addition to evaluating the method on well-known benchmarks, we also face the challenging problem of protein solubility recognition [3]. Classification of proteins with respect to their solubility degree is a hard yet very important scientific problem, with consequences related to the folding of such macro-molecules [43].…”

Section: Introductionmentioning

confidence: 99%

One-Class Classifiers Based on Entropic Spanning Graphs

IEEE Trans. Neural Netw. Learning Syst.

Alippi

2017

Self Cite

Abstract-One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach also takes into account the possibility to process nonnumeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data, and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graphbased fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.