Abstract:Co-expression networks are essential tools to infer biological associations between gene products and predict gene annotation. Global networks can be analyzed at the transcriptome-wide scale or after querying them with a set of guide genes to capture the transcriptional landscape of a given pathway in a process named Pathway Level Coexpression (PLC). A critical step in network construction remains the definition of gene co-expression. In the present work, we compared how Pearson Correlation Coefficient (PCC), … Show more
“…A recent study focused on microarray and RNA‐seq based global and targeted co‐expression networks in Arabidopsis (Liesecke et al., ). This study identified Pathway Level Co‐expression using a set of guide genes, and compared how Pearson Correlation Coefficient (PCC), Spearman Correlation Coefficient (SCC), their respective ranked values (Highest Reciprocal Rank (HRR)), Mutual Information (MI) and Partial Correlations (PC) performed on global networks.…”
Numerous gene expression datasets from diverse tissue samples from the plant variety
Arabidopsis thaliana
have been already deposited in the public domain. There have been several attempts to do large scale meta‐analyses of all of these datasets. Most of these analyses summarize pairwise gene expression relationships using correlation, or identify differentially expressed genes in two conditions. We propose here a new large scale meta‐analysis of the publicly available
Arabidopsis
datasets to identify Boolean logical relationships between genes. Boolean logic is a branch of mathematics that deals with two possible values. In the context of gene expression datasets we use qualitative high and low expression values. A strong logical relationship between genes emerges if at least one of the quadrants is sparsely populated. We pointed out serious issues in the data normalization steps widely accepted and published recently in this context. We put together a web resource where gene expression relationships can be explored online which helps visualize the logical relationships between genes. We believe that this website will be useful in identifying important genes in different biological context. The web link is
http://hegemon.ucsd.edu/plant/
.
“…A recent study focused on microarray and RNA‐seq based global and targeted co‐expression networks in Arabidopsis (Liesecke et al., ). This study identified Pathway Level Co‐expression using a set of guide genes, and compared how Pearson Correlation Coefficient (PCC), Spearman Correlation Coefficient (SCC), their respective ranked values (Highest Reciprocal Rank (HRR)), Mutual Information (MI) and Partial Correlations (PC) performed on global networks.…”
Numerous gene expression datasets from diverse tissue samples from the plant variety
Arabidopsis thaliana
have been already deposited in the public domain. There have been several attempts to do large scale meta‐analyses of all of these datasets. Most of these analyses summarize pairwise gene expression relationships using correlation, or identify differentially expressed genes in two conditions. We propose here a new large scale meta‐analysis of the publicly available
Arabidopsis
datasets to identify Boolean logical relationships between genes. Boolean logic is a branch of mathematics that deals with two possible values. In the context of gene expression datasets we use qualitative high and low expression values. A strong logical relationship between genes emerges if at least one of the quadrants is sparsely populated. We pointed out serious issues in the data normalization steps widely accepted and published recently in this context. We put together a web resource where gene expression relationships can be explored online which helps visualize the logical relationships between genes. We believe that this website will be useful in identifying important genes in different biological context. The web link is
http://hegemon.ucsd.edu/plant/
.
“…where ci is equal to the corresponding confidence level (i.e., 68% = 1, 95% = 2, 99% = 3 Only gene names in common between the original data file and XPRESSpipe output were used for the method comparisons. Correlation between methods or replicates were calculated using a Spearman rank correlation coefficient, performed using the scipy.stats.spearmanr() function [32]. Pearson correlation coefficients were calculated using log 10 (rpm(counts) + 1) transformed data and the scipy.stats.pearsonr() function.…”
Section: Confidence Interval Plottingmentioning
confidence: 99%
“…Raw data were processed on a protected high-performance computing environment. Correlations between methods or replicates were calculated using a Spearman rank correlation coefficient, performed using the scipy.stats.spearman() function [32]. Interactive scatter plots were generated using Plotly Express [22].…”
To further validate the design, reliability, and versatility of the XPRESSpipe pipeline, we processed raw TCGA sequence data using XPRESSpipe and compared the output count values to those publicly available through TCGA [1]. Spearman ρ values for the selected samples ranged from 0.979-0.980 when pseudogenes were excluded (Figure 1), indicating XPRESSpipe performs with similar accuracy to the TCGA RNA-Seq processing standards. The differences in reported counts can be accounted for by a couple of key differences. For instance, the XPRESSpipe-processed files are aligned to the Homo sapiens GRChv98 reference transcriptome, while the original count data are aligned to the GRChv79 reference transcriptome. The use of a different transcriptome reference can result in variance in the final quantified data for several genes (Figure 2) as significant advances have been made in our understanding of transcribed regions of the human genome between versions.Another source of dissimilarity in data processing appears to arise if an Ensembl canonical transcripts-only reference is used during quantification. TCGA-processed data used an unmodified transcriptome reference file (all transcripts); therefore, the use of this modified (Ensembl canonical transcripts only) GTF will produce varied quantification for some genes as quantifications are constrained to a single transcript version of a given gene and a read will not be quantified if mapping to an exon not used by the canonical transcript. Even using XPRESSpipe settings closest to the TCGA pipeline and using the same genome and transcriptome version resulted in some variation (Figure 2, plot enclosed in maroon). By performing a more detailed analysis of these differences, it is clear that virtually all genes exhibiting variance between the processing methods are pseudogenes, with the TCGA pipeline accepting and quantifying more pseudogenes at the time of initial analysis of this dataset. This can be indicative of the difficulty surrounding the recognition of these reads as multi-mapping to both the original gene and pseudogene (Figure 3,4,5; interactive plots accompanying Figure 5 can be accessed at [2].
“…Traditionally, statistical-based co-expression indices have been used to calculate the dependencies between genes [ 5 , 7 ]. Some of the most popular correlation coefficients are Pearson, Kendall or Spearman [ 11 , 12 , 13 ]. Despite their popularity, statistical-based measures present some limitations [ 14 ].…”
Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E Δ H S C compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E Δ H S C mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E Δ H S C mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.