ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues

Mangul, Serghei; Yang, Harry Taegyun; Strauli, Nicolas; Gruhl, Franziska; Porath, Hagit T.; Hsieh, Kevin; Chen, Linus; Daley, Timothy; Christenson, S.; Wesolowska-Andersen, Agata; Spreafico, Roberto; Rios, Cydney; Eng, Celeste; Smith, Andrew D.; Hernández, Ryan D.; Ophoff, Roel A.; Santana, Jose Rodriguez; Levanon, Erez Y.; Woodruff, Prescott G.; Burchard, Esteban G.; Seibold, Max A.; Shifman, Sagiv; Eskin, Eleazar; Zaitlen, Noah

doi:10.1186/s13059-018-1403-7

Cited by 44 publications

(37 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The stored attributes can be configured for different scheduling systems (Table S1 lists the attributes in the table Job assuming a cluster with SGE). These records are retained over 8 time to support job statistics and analytics. As this data is aggregated, the average memory and elapsed time for a given bioinformatics pipeline may be extracted as a function of the input parameters.…”

Section: Methodsmentioning

confidence: 99%

“…Life science and biomedical researchers must choose from an unprecedented diversity of software tools and datasets designed for analyzing increasingly large outputs from modern genomics and sequencing technologies, which are supported by high-performance cluster infrastructures 3 . Scientific discovery in academia and industry now relies on the seamless integration of bioinformatics tools, omics datasets, and large clusters [4][5][6][7][8] .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Telescope: an interactive tool for managing large-scale analysis from mobile devices

et al. 2020

Self Cite

View full text Add to dashboard Cite

In today's world of big data, computational analysis has become a key driver of biomedical research. High-performance computational facilities are capable of processing considerable volumes of data, yet often lack an easy-to-use interface to guide the user in supervising and adjusting bioinformatics analysis via a tablet or smartphone. Telescope is a novel tool that interfaces with high-performance computational clusters to deliver an intuitive user interface for controlling and monitoring bioinformatics analyses in real-time. Telescope provides a user-friendly method for integrating computational analyses with experimental biomedical research. Telescope is freely available at https://github.com/Mangul-Lab-USC/telescope . 2

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Telescope: an interactive tool for managing large-scale analysis from mobile devices

et al. 2020

Self Cite

View full text Add to dashboard Cite

show abstract

“…During this step, there is also an optional filtering criteria that can be utilised to remove unaligned sequences which likely originate from a low complexity region or tandem repeat region. The filtering method is based on the tandem repeat detection step used in the ROP tool [19], which uses MegaBLAST [4] to align reads against a repeat sequence database, such as RepBase [2].…”

Section: Consensus Filteringmentioning

confidence: 99%

“…We also performed a comparison of the Scavenger pipeline against a recently published tool, Read Origin Protocol (ROP), which is primarily designed to identify the origin of unaligned reads [19]. The ROP tool consists of 6 steps, with each step designed to identify different causes for unaligned reads: reads with low quality, lost human reads, reads from repeat sequences, non-colinear RNA reads, reads from V(D)J recombination and reads belonging to microbial communities.…”

Section: Recovery Of Reads On Simulated Datamentioning

confidence: 99%

Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads

Yang

Tang

Troup

et al. 2018

Preprint

View full text Add to dashboard Cite

Motivation: Read alignment is an important step in RNA-seq analysis as the result of alignment forms the basis for further downstream analysis. However, recent studies have shown that published alignment tools have variable mapping sensitivity and do not necessarily align reads which should have been aligned, a problem we termed as the false-negative non-alignment problem.Results: We have developed Scavenger, a pipeline for recovering unaligned reads using a novel mechanism which utilises information from aligned reads. Scavenger performs recovery of unaligned reads by re-aligning unaligned reads against a putative location derived from aligned reads with sequence similarity against unaligned reads. We show that Scavenger can successfully recover unaligned reads in both simulated and real RNA-seq datasets, including single-cell RNA-seq data. The reads recovered contain more genetic variants compared to previously aligned reads, indicating that divergence between personal and reference genome plays a role in the false-negative non-alignment problem. We also explored the impact of read recovery on downstream analysis, in particular gene expression analysis, and showed that Scavenger is able to both recover genes which were previously non-expressed and also increase gene expression, with lowly expressed genes having the most impact from the addition of recovered reads. We also found that the majority of genes with >1 fold change in expression after recovery are of pseudogenes category, indicating that pseudogenes expression can be substantially affected by the false-negative non-alignment problem.

show abstract

“…Repurposing read data from human sequencing studies that do not map to the human genome may reveal the microbiome, in parallel with the primary study. A possible application of this principle involves unmapped RNA-sequencing data (6,31,32), with detection via PathSeq (33), the microbial discovery pipeline for sequencing data available in the Genome Analysis Toolkit (GATK).…”

Section: Introductionmentioning

confidence: 99%

Peripheral blood microbial signatures in COPD

Morrow

Castaldi

Chase

et al. 2020

Preprint

View full text Add to dashboard Cite

Background:The human microbiome has a role in the development of human diseases. Individual microbiome profiles are highly personalized, though many species are shared. Understanding the relationship between the human microbiome and disease may inform future individualized treatments.Specifically, the blood microbiome, once believed sterile, may be a surrogate for some lung and gut microbial characteristics. We sought associations between the blood microbiome and lung-relevant host factors.Methods: Based on reads not mapped to the human genome, we detected microbial nucleic acid signatures in peripheral blood RNA-sequencing for 2,590 current and former smokers with and without chronic obstructive pulmonary disease (COPD) from the COPDGene study. We used the GATK microbial pipeline PathSeq to infer microbial profiles. We tested associations between the inferred profiles and lung disease relevant phenotypes and examined links to host gene expression pathways. Results:The four phyla with highest abundance across all subjects were Proteobacteria, Actinobacteria, Firmicutes and Bacteroidetes. We observed associations between exacerbation phenotypes and the relative abundance of Staphylococcus, Acidovorax and Cupriavidus. The genus Flavobacterium was associated with emphysema and change in emphysema. Our host-microbiome interaction analysis revealed clustering of genera associated with emphysema, systemic inflammation, airway remodeling and exacerbations, through links to lung-relevant host pathways. Conclusions:This study is the first to identify a bacterial microbiome signature in the peripheral blood of current and former smokers. Understanding the relationships between the systemic microbial populations and lung disease severity may inform novel interventions and aid in the understanding of exacerbation phenotypes. Abstract word count: 246 COPD Foundation FundingCOPDGene is also supported by the COPD Foundation through contributions made to an Industry

show abstract

ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues

Cited by 44 publications

References 46 publications

Telescope: an interactive tool for managing large-scale analysis from mobile devices

Telescope: an interactive tool for managing large-scale analysis from mobile devices

Scavenger: A pipeline for recovery of unaligned reads utilising similarity with aligned reads

Peripheral blood microbial signatures in COPD

Contact Info

Product

Resources

About