2018
DOI: 10.1186/s13059-018-1403-7
|View full text |Cite
|
Sign up to set email alerts
|

ROP: dumpster diving in RNA-sequencing to find the source of 1 trillion reads across diverse adult human tissues

Abstract: High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Addition… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
35
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 44 publications
(37 citation statements)
references
References 46 publications
2
35
0
Order By: Relevance
“…The stored attributes can be configured for different scheduling systems (Table S1 lists the attributes in the table Job assuming a cluster with SGE). These records are retained over 8 time to support job statistics and analytics. As this data is aggregated, the average memory and elapsed time for a given bioinformatics pipeline may be extracted as a function of the input parameters.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The stored attributes can be configured for different scheduling systems (Table S1 lists the attributes in the table Job assuming a cluster with SGE). These records are retained over 8 time to support job statistics and analytics. As this data is aggregated, the average memory and elapsed time for a given bioinformatics pipeline may be extracted as a function of the input parameters.…”
Section: Methodsmentioning
confidence: 99%
“…Life science and biomedical researchers must choose from an unprecedented diversity of software tools and datasets designed for analyzing increasingly large outputs from modern genomics and sequencing technologies, which are supported by high-performance cluster infrastructures 3 . Scientific discovery in academia and industry now relies on the seamless integration of bioinformatics tools, omics datasets, and large clusters [4][5][6][7][8] .…”
Section: Introductionmentioning
confidence: 99%
“…During this step, there is also an optional filtering criteria that can be utilised to remove unaligned sequences which likely originate from a low complexity region or tandem repeat region. The filtering method is based on the tandem repeat detection step used in the ROP tool [19], which uses MegaBLAST [4] to align reads against a repeat sequence database, such as RepBase [2].…”
Section: Consensus Filteringmentioning
confidence: 99%
“…We also performed a comparison of the Scavenger pipeline against a recently published tool, Read Origin Protocol (ROP), which is primarily designed to identify the origin of unaligned reads [19]. The ROP tool consists of 6 steps, with each step designed to identify different causes for unaligned reads: reads with low quality, lost human reads, reads from repeat sequences, non-colinear RNA reads, reads from V(D)J recombination and reads belonging to microbial communities.…”
Section: Recovery Of Reads On Simulated Datamentioning
confidence: 99%
“…Repurposing read data from human sequencing studies that do not map to the human genome may reveal the microbiome, in parallel with the primary study. A possible application of this principle involves unmapped RNA-sequencing data (6,31,32), with detection via PathSeq (33), the microbial discovery pipeline for sequencing data available in the Genome Analysis Toolkit (GATK).…”
Section: Introductionmentioning
confidence: 99%