Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks

Liesecke, Franziska; Daudu, Dimitri; Bernonville, Rodolphe Dugé de; Besseau, Sébastien; Clastre, Marc; Courdavault, Vincent; Craene, Johan-Owen De; Crèche, Joël; Giglioli-Guivarc’h, Nathalie; Glévarec, Gaëlle; Pichon, Olivier; Bernonville, Thomas Dugé de

doi:10.1038/s41598-018-29077-3

Cited by 67 publications

(50 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A recent study focused on microarray and RNA‐seq based global and targeted co‐expression networks in Arabidopsis (Liesecke et al., ). This study identified Pathway Level Co‐expression using a set of guide genes, and compared how Pearson Correlation Coefficient (PCC), Spearman Correlation Coefficient (SCC), their respective ranked values (Highest Reciprocal Rank (HRR)), Mutual Information (MI) and Partial Correlations (PC) performed on global networks.…”

Section: Discussionmentioning

confidence: 99%

Identification of gene expression logical invariants in Arabidopsis

Pandey

Sahoo

2019

Plant Direct

View full text Add to dashboard Cite

Numerous gene expression datasets from diverse tissue samples from the plant variety Arabidopsis thaliana have been already deposited in the public domain. There have been several attempts to do large scale meta‐analyses of all of these datasets. Most of these analyses summarize pairwise gene expression relationships using correlation, or identify differentially expressed genes in two conditions. We propose here a new large scale meta‐analysis of the publicly available Arabidopsis datasets to identify Boolean logical relationships between genes. Boolean logic is a branch of mathematics that deals with two possible values. In the context of gene expression datasets we use qualitative high and low expression values. A strong logical relationship between genes emerges if at least one of the quadrants is sparsely populated. We pointed out serious issues in the data normalization steps widely accepted and published recently in this context. We put together a web resource where gene expression relationships can be explored online which helps visualize the logical relationships between genes. We believe that this website will be useful in identifying important genes in different biological context. The web link is http://hegemon.ucsd.edu/plant/ .

show abstract

Section: Discussionmentioning

confidence: 99%

Identification of gene expression logical invariants in Arabidopsis

Pandey

Sahoo

2019

Plant Direct

View full text Add to dashboard Cite

show abstract

“…where ci is equal to the corresponding confidence level (i.e., 68% = 1, 95% = 2, 99% = 3 Only gene names in common between the original data file and XPRESSpipe output were used for the method comparisons. Correlation between methods or replicates were calculated using a Spearman rank correlation coefficient, performed using the scipy.stats.spearmanr() function [32]. Pearson correlation coefficients were calculated using log 10 (rpm(counts) + 1) transformed data and the scipy.stats.pearsonr() function.…”

Section: Confidence Interval Plottingmentioning

confidence: 99%

“…Raw data were processed on a protected high-performance computing environment. Correlations between methods or replicates were calculated using a Spearman rank correlation coefficient, performed using the scipy.stats.spearman() function [32]. Interactive scatter plots were generated using Plotly Express [22].…”

Section: Tcga Data Analysismentioning

confidence: 99%

XPRESSyourself: Enhancing, standardizing, and automating ribosome profiling computational analyses yields improved insight into data

et al. 2020

View full text Add to dashboard Cite

To further validate the design, reliability, and versatility of the XPRESSpipe pipeline, we processed raw TCGA sequence data using XPRESSpipe and compared the output count values to those publicly available through TCGA [1]. Spearman ρ values for the selected samples ranged from 0.979-0.980 when pseudogenes were excluded (Figure 1), indicating XPRESSpipe performs with similar accuracy to the TCGA RNA-Seq processing standards. The differences in reported counts can be accounted for by a couple of key differences. For instance, the XPRESSpipe-processed files are aligned to the Homo sapiens GRChv98 reference transcriptome, while the original count data are aligned to the GRChv79 reference transcriptome. The use of a different transcriptome reference can result in variance in the final quantified data for several genes (Figure 2) as significant advances have been made in our understanding of transcribed regions of the human genome between versions.Another source of dissimilarity in data processing appears to arise if an Ensembl canonical transcripts-only reference is used during quantification. TCGA-processed data used an unmodified transcriptome reference file (all transcripts); therefore, the use of this modified (Ensembl canonical transcripts only) GTF will produce varied quantification for some genes as quantifications are constrained to a single transcript version of a given gene and a read will not be quantified if mapping to an exon not used by the canonical transcript. Even using XPRESSpipe settings closest to the TCGA pipeline and using the same genome and transcriptome version resulted in some variation (Figure 2, plot enclosed in maroon). By performing a more detailed analysis of these differences, it is clear that virtually all genes exhibiting variance between the processing methods are pseudogenes, with the TCGA pipeline accepting and quantifying more pseudogenes at the time of initial analysis of this dataset. This can be indicative of the difficulty surrounding the recognition of these reads as multi-mapping to both the original gene and pseudogene (Figure 3,4,5; interactive plots accompanying Figure 5 can be accessed at [2].

show abstract

“…Traditionally, statistical-based co-expression indices have been used to calculate the dependencies between genes [ 5 , 7 ]. Some of the most popular correlation coefficients are Pearson, Kendall or Spearman [ 11 , 12 , 13 ]. Despite their popularity, statistical-based measures present some limitations [ 14 ].…”

Section: Introductionmentioning

confidence: 99%

Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks

Delgado-Chaves

Gómez-Vela

Divina

et al. 2020

Genes

View full text Add to dashboard Cite

Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E Δ H S C compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E Δ H S C mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E Δ H S C mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches.

show abstract

Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks

Cited by 67 publications

References 48 publications

Identification of gene expression logical invariants in Arabidopsis

Identification of gene expression logical invariants in Arabidopsis

XPRESSyourself: Enhancing, standardizing, and automating ribosome profiling computational analyses yields improved insight into data

Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks

Contact Info

Product

Resources

About