High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Additionally, we use ROP to investigate the functional mechanisms underlying connections between the immune system, microbiome, and disease. ROP is freely available at https://github.com/smangul1/rop/wiki.Electronic supplementary materialThe online version of this article (10.1186/s13059-018-1403-7) contains supplementary material, which is available to authorized users.
ABSTRACT. AFLP markers combined with the bulk segregant analysis methodology was used for the identification of molecular markers associated with the cowpea golden mosaic virus (CGMV) resistance gene in 286 F 2 cowpea plants derived from the cross IT97K-499-35 x Canapu T16. Segregation data in the F 2 population demonstrated that tolerance to CGMV is controlled by a single dominant gene. Among the 196 combinations of AFLP primers tested, which generated approximately 3800 amplicons, three markers linked to the CGMV resistance gene were identified: E.AAC/M.CCC 515 at 4.3 cM, E.AGG/M.CTT 280 at 14.2 cM and E.AAA/M.CAG 352 at 16.8 cM, with 50.4, 24.4, and 28.7 LOD scores, respectively; the former two markers flank the CGMV loci. These markers could be used for the development of 'sequence characterized amplified region' type markers or for greater saturation of this region, to increase the precision of assisted selection for the development of cowpea strains tolerant to CGMV.
48High throughput RNA sequencing technologies have provided invaluable research 49 opportunities across distinct scientific domains by producing quantitative readouts of the 50 transcriptional activity of both entire cellular populations and single cells. The majority of 51 RNA-Seq analyses begin by mapping each experimentally produced sequence (i.e., read) 52 to a set of annotated reference sequences for the organism of interest. For both biological 53 and technical reasons, a significant fraction of reads remains unmapped. In this work, we 54 develop Read Origin Protocol (ROP) to discover the source of all reads originating from 55 complex RNA molecules, recombinant T and B cell receptors, and microbial communities. 56We applied ROP to 8,641 samples across 630 individuals from 54 tissues. A fraction of 57 RNA-Seq data (n=86) was obtained in-house; the remaining data was obtained from the 58 Genotype-Tissue Expression (GTEx v6) project. To generalize the reported number of 59 accounted reads, we also performed ROP analysis on thousands of different, randomly 60 selected, and publicly available RNA-Seq samples in the Sequence Read Archive (SRA). 61Our approach can account for 99.9% of 1 trillion reads of various read length across the 62 merged dataset (n=10641). Using in-house RNA-Seq data, we show that immune profiles 63 of asthmatic individuals are significantly different from the profiles of control individuals, 64with decreased average per sample T and B cell receptor diversity. We also show that 65 immune diversity is inversely correlated with microbial load. Our results demonstrate the 66 potential of ROP to exploit unmapped reads in order to better understand the functional 67 mechanisms underlying connections between the immune system, microbiome, human 68 gene expression, and disease etiology. ROP is freely available at 69 https://github.com/smangul1/rop and currently supports human and mouse RNA-Seq 70 reads. 71 72 73 2017) . After alignment, reads are grouped into genomic (e.g., CDS, UTRs, introns) and 129repetitive (e.g., SINEs, LINEs, LTRs) categories. The rest of the ROP protocol characterizes 130 the remaining unmapped reads, which failed to map to the human reference sequences. 131 132The ROP protocol effectively processes the unmapped reads in seven steps. First, 133we apply a quality control step to exclude low-quality reads, low-complexity reads, and 134 reads that match rRNA repeat units among the unmapped reads (FASTQC (Andrews & 135 others, 2010), SEQCLEAN ("https://sourceforge.net/projects/seqclean/," n.d.)). Next, we 136 employ Megablast (Camacho et al., 2009), a more sensitive alignment method, to search 137 for human reads missed due to heuristics implemented for computational speed in 138 conventional aligners and reads with additional mismatches. These reads typically include 139
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.