Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Pardo-Palacios, Francisco; Reese, Fairlie; Carbonell-Sala, Sílvia; Diekhans, Mark; Liang, Cindy; Wang, Dingjie; Williams, Brian A.; Adams, Matthew S.; Behera, Amit; Lagarde, Julien; Li, Haoran; Prjibelski, Andrey D.; Balderrama-Gutierrez, Gabriela; Çelik, Muhammed Hasan; María, Maite De; Denslow, Nancy D.; García-Reyero, Natàlia; Goetz, Stefan M.; Hunter, Margaret E.; Loveland, Jane; Menor, Carlos; Moraga, David; Mudge, Jonathan M.; Takahashi, Hazuki; Tang, Alison D.; Youngworth, Ingrid; Piero, Carninci; Guigó, Roderic; Tilgner, Hagen; Wold, B; Vollmers, Christopher; Sheynkman, Gloria; Frankish, Adam; Au, Kin Fai; Conesa, Ana; Mortazavi, A; Brooks, Angela N.

doi:10.21203/rs.3.rs-777702/v1

Cited by 38 publications

(56 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Increasing accuracy to the level of PacBio Iso-Seq [ 23 , 24 , 42 ] could increase this number significantly. Paired with the higher throughput we can achieve by optimizing raw read to consensus read conversion as we have previously shown [ 43 ], future experiments could only retain UMIs which were observed more than once, similar to how we analyze Illumina data (see the “ Methods ” section).…”

Section: Discussionmentioning

confidence: 95%

Single-cell isoform analysis in human immune cells

Volden

Vollmers

2022

Genome Biol

Self Cite

View full text Add to dashboard Cite

High-throughput single-cell analysis today is facilitated by protocols like the 10X Genomics platform or Drop-Seq which generate cDNA pools in which the origin of a transcript is encoded at its 5′ or 3′ end. Here, we used R2C2 to sequence and demultiplex 12 million full-length cDNA molecules generated by the 10X Genomics platform from ~3000 peripheral blood mononuclear cells. We use these reads, independent from Illumina data, to identify B cell, T cell, and monocyte clusters and generate isoform-level transcriptomes for cells and cell types. Finally, we extract paired adaptive immune receptor sequences unique to each T and B cell.

show abstract

Section: Discussionmentioning

confidence: 95%

Single-cell isoform analysis in human immune cells

Volden

Vollmers

2022

Genome Biol

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, as both ONT and PacBio sequencing improves in both coverage and sensitivity, an entire long-read-derived proteome should be able to be generated de novo from sample-specific transcriptomes. Furthermore, rigorous benchmarking studies, such as those being conducted by The Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) Consortium, will reveal strength and limitations of these methods for the community [ 66 ].…”

Section: Discussionmentioning

confidence: 99%

Enhanced protein isoform characterization through long-read proteogenomics

et al. 2022

Self Cite

View full text Add to dashboard Cite

Background The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. Results We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. Conclusions Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.

show abstract

“…After preprocessing, only 2.5 million 1D reads remained compared to ~8 million R2C2 reads (Table 2). This means that even a much more productive 1D run, potentially generating up to 20 million raw reads for molecules of this length (15), would still generate fewer demultiplexed reads (~5 million) than the R2C2 run we performed here.…”

Section: Resultsmentioning

confidence: 99%

Illumina But With Nanopore: Sequencing Illumina libraries at high accuracy on the ONT MinION using R2C2

Zee

Deng

Adams

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

High-throughput short-read sequencing has taken on a central role in research and diagnostics. Literally hundreds of different assays exist today to take advantage of Illumina short-read sequencers, the predominant short-read sequencing technology available today. Although other short read sequencing technologies exist, the ubiquity of Illumina sequencers in sequencing core facilities, and the inertia associated with the research enterprise as a whole have limited their adoption. Among a new generation of sequencing technologies, Oxford Nanopore Technologies (ONT) holds a unique position because the ONT MinION, an error-prone long-read sequencer, is associated with little to no capital cost. Here we show that we can make short-read Illumina libraries compatible with the long-read ONT MinION by circularizing and rolling circle amplifying the short library molecules using the R2C2 method. This results in longer DNA molecules containing tandem repeats of the original short library molecules. This longer DNA is ideally suited for the ONT MinION, and after sequencing, the tandem repeats in the resulting raw reads can be converted into millions of high-accuracy consensus reads with similar error rates to that of the Illumina MiSeq. We highlight this capability by producing and benchmarking RNA-seq, ChIP-seq, as well as regular and target-enriched Tn5 libraries. We also explore the use of this approach for rapid evaluation of sequencing library metrics by implementing a real-time analysis workflow.

show abstract

Systematic assessment of long-read RNA-seq methods for transcript identification and quantification

Cited by 38 publications

References 22 publications

Single-cell isoform analysis in human immune cells

Single-cell isoform analysis in human immune cells

Enhanced protein isoform characterization through long-read proteogenomics

Illumina But With Nanopore: Sequencing Illumina libraries at high accuracy on the ONT MinION using R2C2

Contact Info

Product

Resources

About