Network Reconstruction Based on Proteomic Data and Prior Knowledge of Protein Connectivity Using Graph Theory

Stavrakas, Vassilis; Melas, Ioannis N.; Sakellaropoulos, Theodore; Alexopoulos, Leonidas G.

doi:10.1371/journal.pone.0128411

Cited by 5 publications

(4 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cutoff optimization as presented in this paper is a flexible and generalizable inference strategy. Most other methods that account for prior knowledge integrate the biological reference directly into a specific network inference or regression framework [25][26][27][28][29][30][31] , for example, by penalizing or enhancing specific edges according to the biological reference. On the contrary, our approach uses prior knowledge as an external reference system to optimize the purely data-driven association matrix.…”

Section: Discussionmentioning

confidence: 99%

A strategy to incorporate prior knowledge into correlation network cutoff selection

et al. 2020

View full text Add to dashboard Cite

Correlation networks are frequently used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the statistical significance of the correlation coefficients. This procedure, however, is not guaranteed to capture biological mechanisms. We here propose an alternative approach for network reconstruction: a cutoff selection algorithm that maximizes the overlap of the inferred network with available prior knowledge. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. Importantly, even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach with applications to untargeted metabolomics and transcriptomics data. For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for optimization.

show abstract

Section: Discussionmentioning

confidence: 99%

A strategy to incorporate prior knowledge into correlation network cutoff selection

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Cutoff optimization as presented in this paper is a very flexible and generalizable inference strategy. Most other methods that account for prior knowledge integrate the biological reference directly into a specific network inference or regression framework [24][25][26][27][28][29][30] , for example by penalizing or enhancing specific edges according to the biological reference. On the contrary, our approach uses prior knowledge as an external reference system to optimize the purely data-driven association matrix.…”

Section: Discussionmentioning

confidence: 99%

A strategy to incorporate prior knowledge into correlation network cutoff selection

Benedetti

Pučić‐Baković

Keser

et al. 2019

Preprint

View full text Add to dashboard Cite

Correlation networks are commonly used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the significance of the underlying correlation coefficients. A statistical cutoff, however, is not guaranteed to capture biological reality, and heavily depends on dataset properties such as sample size. We here propose an alternative, innovative approach to address the problem of network reconstruction. Specifically, we developed a cutoff selection algorithm that maximizes the agreement to a given ground truth. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. The optimal network outperforms networks obtained with statistical cutoffs and is robust with respect to sample size. Importantly, we can show that even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach on an untargeted metabolomics and a transcriptomics dataset from The Cancer Genome Atlas (TCGA). For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for the optimization. Overall, this paper shows that using prior information for correlation network inference is superior to using regular statistical cutoffs, even if the prior information is incomplete or partially inaccurate. KeywordsCorrelation cutoff / Correlation Networks / Gaussian Graphical Models / Network inference / Prior knowledge As expected, for both Pearson correlation and parcor, the significance cutoff, i.e. the smallest still-significant correlation coefficient (in absolute value), decreases with increasing sample size and does not converge even for larger sample sizes (Figure 2A, red and blue curves, respectively). Interestingly, partial correlations estimated with GeneNet do not show the same behavior, as the statistical correlation cutoff is fairly stable across the considered sample sizes (Figure 2A, black line). This is also reflected in the total number of edges in the resulting network: While for Pearson correlation and parcor the number of significant coefficients included in the network systematically increases with the sample size, the network estimated with GeneNet maintains a roughly constant number of edges ( Figure 2B). As an example, when considering twice as many samples, from 200 to 400, the GeneNet network remains stable with around 60 edges, while the Pearson correlation network increases by a factor of roughly 1.2 (from 655 to 790) and the parcor network increases by a factor 1.5 (from 95 to 155). Analogous results were obtained in the three replication cohorts ( Figure S1).This first analysis showed that indeed there is a strong dependence of network density (number of significant correlation) on sample size of the dataset for both Pearson and partial correlations. GeneNet did not show t...

show abstract

“…With this method, biological entities such as genes, proteins, small compounds and RNAs are represented as nodes, and the interactions among nodes (termed edges or relationships) denote biological relationships among the biological entities. Traversal algorithms of graphical models have been used to mine valuable relationships across networks ( Stavrakas et al 2015 ) that might be omitted by traditional relational database search methods. However, a graph-based database for analyzing the biological networks that control signal transduction, metabolism and gene regulation is still lacking for the end-users, i.e.…”

Section: Introductionmentioning

confidence: 99%

HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks

Dai

Liu

et al. 2015

Plant Cell Physiol

View full text Add to dashboard Cite

The biological networks controlling plant signal transduction, metabolism and gene regulation are composed of not only tens of thousands of genes, compounds, proteins and RNAs but also the complicated interactions and co-ordination among them. These networks play critical roles in many fundamental mechanisms, such as plant growth, development and environmental response. Although much is known about these complex interactions, the knowledge and data are currently scattered throughout the published literature, publicly available high-throughput data sets and third-party databases. Many ‘unknown’ yet important interactions among genes need to be mined and established through extensive computational analysis. However, exploring these complex biological interactions at the network level from existing heterogeneous resources remains challenging and time-consuming for biologists. Here, we introduce HRGRN, a graph search-empowered integrative database of Arabidopsis signal transduction, metabolism and gene regulatory networks. HRGRN utilizes Neo4j, which is a highly scalable graph database management system, to host large-scale biological interactions among genes, proteins, compounds and small RNAs that were either validated experimentally or predicted computationally. The associated biological pathway information was also specially marked for the interactions that are involved in the pathway to facilitate the investigation of cross-talk between pathways. Furthermore, HRGRN integrates a series of graph path search algorithms to discover novel relationships among genes, compounds, RNAs and even pathways from heterogeneous biological interaction data that could be missed by traditional SQL database search methods. Users can also build subnetworks based on known interactions. The outcomes are visualized with rich text, figures and interactive network graphs on web pages. The HRGRN database is freely available at http://plantgrn.noble.org/hrgrn/.

show abstract

Network Reconstruction Based on Proteomic Data and Prior Knowledge of Protein Connectivity Using Graph Theory

Cited by 5 publications

References 47 publications

A strategy to incorporate prior knowledge into correlation network cutoff selection

A strategy to incorporate prior knowledge into correlation network cutoff selection

A strategy to incorporate prior knowledge into correlation network cutoff selection

HRGRN: A Graph Search-Empowered Integrative Database of Arabidopsis Signaling Transduction, Metabolism and Gene Regulation Networks

Contact Info

Product

Resources

About