A field guide for the compositional analysis of any-omics data

Quinn, Thomas P.; Erb, Ionas; Gloor, Gregory B.; Notredame, Cédric; Richardson, Mark F.; Crowley, Tamsyn M.

doi:10.1093/gigascience/giz107

Cited by 230 publications

(216 citation statements)

References 79 publications

Supporting

Mentioning

216

Contrasting

Order By: Relevance

“…For example, transcription of asRNA may constitute a significant percentage of the data and may be associated with only a few genes. As sequencing data are inherently compositional, there will be an overrepresentation of spurious negative correlations with the remaining gene population, which cannot be amended using traditional quantitative data analysis (41). This is true regardless of whether the highly expressed genes are systematically related to the experiment or not.…”

Section: Resultsmentioning

confidence: 99%

“…The naive solution will be to quantify mRNA only, ensuring that there are sufficient data for proper mRNA quantification. However, compositional data analysis methods exist, for which these issues can be amended (41,42). As a minimum, we encourage paying attention to highly expressed genes with high fractions of asRNA, e.g., Ͼ90%, and either naively discarding them from downstream analysis or performing a thorough investigation to verify their credibility using existing tools for detecting spurious open reading frames, such as AntiFam (35).…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

The Signal and the Noise: Characteristics of Antisense RNA in Complex Microbial Communities

et al. 2020

View full text Add to dashboard Cite

High-throughput sequencing has allowed unprecedented insight into the composition and function of complex microbial communities. With metatranscriptomics, it is possible to interrogate the transcriptomes of multiple organisms simultaneously to get an overview of the gene expression of the entire community. Studies have successfully used metatranscriptomics to identify and describe relationships between gene expression levels and community characteristics. However, metatranscriptomic data sets contain a rich suite of additional information that is just beginning to be explored. Here, we focus on antisense expression in metatranscriptomics, discuss the different computational strategies for handling it, and highlight the strengths but also potentially detrimental effects on downstream analysis and interpretation. We also analyzed the antisense transcriptomes of multiple genomes and metagenome-assembled genomes (MAGs) from five different data sets and found high variability in the levels of antisense transcription for individual species, which were consistent across samples. Importantly, we challenged the conceptual framework that antisense transcription is primarily the product of transcriptional noise and found mixed support, suggesting that the total observed antisense RNA in complex communities arises from the combined effect of unknown biological and technical factors. Antisense transcription can be highly informative, including technical details about data quality and novel insight into the biology of complex microbial communities. IMPORTANCE This study systematically evaluated the global patterns of microbial antisense expression across various environments and provides a bird’s-eye view of general patterns observed across data sets, which can provide guidelines in our understanding of antisense expression as well as interpretation of metatranscriptomic data in general. This analysis highlights that in some environments, antisense expression from microbial communities can dominate over regular gene expression. We explored some potential drivers of antisense transcription, but more importantly, this study serves as a starting point, highlighting topics for future research and providing guidelines to include antisense expression in generic bioinformatic pipelines for metatranscriptomic data.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

The Signal and the Noise: Characteristics of Antisense RNA in Complex Microbial Communities

et al. 2020

View full text Add to dashboard Cite

show abstract

“…These standard strategies are widely employed, but have recently been questioned due to the compositional nature of whole metagenomic sequencing data [29,30]. To address this issue, several Compositional Data Analysis (CoDA) approaches to analyze sequencing datasets have been recently proposed [31,32].…”

Section: Such Unique Features Make Standard Parametric Tests and Mostmentioning

confidence: 99%

ResistoXplorer: a web-based tool for visual, statistical and exploratory data analysis of resistome data

Dhariwal

Junges

Chen

et al. 2020

Preprint

View full text Add to dashboard Cite

Background The study of resistomes using whole metagenomic sequencing enables access to the large repertoire of resistance genes usually found in complex microbial communities, such as the human microbiome. Over recent years, sophisticated and diverse pipelines have been established to facilitate raw data processing and annotation. Despite the progress, there are no easy-to-use tools for comprehensive visual, statistical, and functional analysis of resistome data. Thus, exploration of the resulting large complex datasets remains a key bottleneck requiring robust computational resources and technical expertise, which creates a significant hurdle for advancements in the field. Results Here, we introduce ResistoXplorer, a user-friendly tool that integrates recent advancements in statistics and visualization, coupled with extensive functional annotations and phenotype collection, to enable high-throughput analysis of common outputs generated from resistome studies. ResistoXplorer contains three modules- the Antimicrobial Resistance Gene Table module offers various options for composition profiling, functional profiling and comparative analysis of resistome data; the Integration module supports integrative exploratory analysis of resistome and microbiome abundance profiles in metagenomic samples; finally, the Antimicrobial Resistance Gene List module enables users to explore antimicrobial resistance genes according to function and potential microbial hosts using visual analytics to gain biological insights. Within these three modules, ResistoXplorer offers comprehensive assistance for ARG functional annotations along with their microbe and phenotype associations based on data collected from >10 reference databases. In addition, it provides support for a variety of methods for composition profiling, visualization and exploratory data analysis, as well as extensive support for various data normalization methods and machine learning algorithms for identification of resistome signatures.Finally, ResistoXplorer offers also network visualization for intuitive exploration of associations between antimicrobial resistance genes and the microbial hosts, incorporated with functional enrichment analysis support. Conclusions ResistoXplorer is a web-based tool with a user-friendly interface that enables comprehensive and real-time downstream analysis of resistome data. As such, it allows for in-depth exploration of metagenomic datasets focusing on the intrinsic networks and correlations of antimicrobial resistance genes and their underlying microbial determinants. ResistoXplorer will assist researchers and clinicians in the field of AMR to facilitate discovery in large-scale and multi-dimensional metagenomic datasets. ResistoXplorer is publicly available at http://resistoxplorer.no/ResistoXplorer/.

show abstract

“…Beyond requiring several pre-processing steps, the summarized data arise from a sampling process that introduces between-sample biases in which the total number of counts, called the sequencing depth, depends on technical factors, not on the amount of input material [12,42,37]. Analysts often attempt to remove this bias with an effective library size normalization, or with normalization to a spike-in or house-keeping transcript [27] (though all normalizations have limitations [36]). Instead, one could build normalization-free gene co-expression networks using proportionality [26].…”

Section: Introductionmentioning

confidence: 99%

Personalized Single-Cell Networks: A Framework to Predict The Response of Any Gene To Any Drug For Any Patient

Harikumar¹,

Quinn²,

Rana³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Background: The last decade has seen a major increase in the availability of genomic data. This includes expert-curated databases that describe the biological activity of genes, as well as high-throughput assays that measure the gene expression of bulk tissue and single cells. Integrating these heterogeneous data sources can generate new hypotheses about biological systems. Our primary objective is to combine population-level drug-response data with patient-level single-cell expression data to predict how any gene will respond to any drug for any patient. Methods: We use a “dual-channel” random walk with restart algorithm to perform 3 analyses. First, we use glioblastoma single cells from 5 individual patients to discover genes whose functions differ between cancers. Second, we use drug screening data from the Library of Integrated Network-Based Cellular Signatures (LINCS) to show how a cell-specific drug-response signature can be accurately predicted from a baseline (drug-free) gene co-expression network. Finally, we combine both data streams to show how we can predict how any gene will respond to any drug for each of the 5 glioblastoma patients. Conclusions: Our manuscript introduces two innovations to the integration of heterogeneous biological data. First, we use a “dual-channel” method to predict up-regulation and down-regulation separately. Second, we use individualized single-cell gene co-expression networks to make personalized predictions. These innovations let us predict gene function and drug response for individual patients. When applied to real data, we identify a number of genes that exhibit a patient-specific drug response, including the pan-cancer oncogene EGFR.

show abstract

A field guide for the compositional analysis of any-omics data

Cited by 230 publications

References 79 publications

The Signal and the Noise: Characteristics of Antisense RNA in Complex Microbial Communities

The Signal and the Noise: Characteristics of Antisense RNA in Complex Microbial Communities

ResistoXplorer: a web-based tool for visual, statistical and exploratory data analysis of resistome data

Personalized Single-Cell Networks: A Framework to Predict The Response of Any Gene To Any Drug For Any Patient

Contact Info

Product

Resources

About