2023
DOI: 10.1038/s41587-022-01565-y
|View full text |Cite
|
Sign up to set email alerts
|

Accurate isoform discovery with IsoQuant using long reads

Abstract: Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant—a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
47
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 49 publications
(51 citation statements)
references
References 38 publications
0
47
0
Order By: Relevance
“…The poly(A) tail lengths of the reads were estimated using Nanopolish-polyA (v10.2) (https://nanopolish.readthedocs.io/en/latest/quickstart_polya.html, [53]) on the reads previously aligned to the GRCh38.p13 human genome and PR8 reference genome using Minimap2 (v2.12) (https://github.com/lh3/minimap2; [73]) in splice mode. We used the mapping tool Isoquant (v3.3) (https://www.gencodegenes.org/human/, [74]) to assign a human gene to each human read using the GRCh38.p13 human genome and the human gene database gencode.v42. A version with and a version without mitochondrial sequences of both the genes and the genome references were used to delete reads for mitochondrial RNA.…”
Section: Methodsmentioning
confidence: 99%
“…The poly(A) tail lengths of the reads were estimated using Nanopolish-polyA (v10.2) (https://nanopolish.readthedocs.io/en/latest/quickstart_polya.html, [53]) on the reads previously aligned to the GRCh38.p13 human genome and PR8 reference genome using Minimap2 (v2.12) (https://github.com/lh3/minimap2; [73]) in splice mode. We used the mapping tool Isoquant (v3.3) (https://www.gencodegenes.org/human/, [74]) to assign a human gene to each human read using the GRCh38.p13 human genome and the human gene database gencode.v42. A version with and a version without mitochondrial sequences of both the genes and the genome references were used to delete reads for mitochondrial RNA.…”
Section: Methodsmentioning
confidence: 99%
“…Oxford Nanopore Technology (ONT) and PacBio HiFi sequencing yielded 250x10 6 and 38x10 6 barcoded long reads respectively for 395 cell clusters (e.g., P56:Thalamus:Replicate1:OPCs) obtained from the short-read analysis pipeline (Methods, Table S2-3). Using recent transcriptdiscovery software 48,49 high accuracy single-cell PacBio reads identified novel splice sites, enhancing the GENCODE annotation by 22.1% (40,184 transcripts). Over 67.3% of mapped, barcoded ONT reads (SD=4.51%, Table S2) represented multi-exonic transcripts with trustworthy splice sites, and ~70% of these ONT transcript models corresponded to annotated or PacBio-derived transcripts (Methods, Fig S1).…”
Section: Resultsmentioning
confidence: 99%
“…In order to ensure the general applicability of scywalker outside of human (or even animal) data, we also generated two plant single-cell data sets (Arabidopsis thaliana), which were analyzed successfully by scywalker, also showing very high correlation with their respective short-read results. Isoform discovery in scywalker is based on IsoQuant, a proven tool showing good results in bulk long-read RNA-seq analysis 14 . In scywalker, the initial discovery is performed without taking cell barcodes into account, i.e., on bulk RNA.…”
Section: Discussionmentioning
confidence: 99%
“…The module ends with creating barcoded FASTQ files where each read has its corrected droplet barcode and UMI sequence added. In the second module, the droplet barcoded reads are aligned to the reference, and isoforms (and genes) are first detected and quantified in a bulk analysis using an adapted version of IsoQuant 14 for the non-organelle chromosomes. Gene counting for organelles is done separately using a specific method because organelles, given their different transcription structure and extreme read counts, often pose problems to isoform callers.…”
Section: Introductionmentioning
confidence: 99%