2019
DOI: 10.1186/s13059-019-1895-9
|View full text |Cite
|
Sign up to set email alerts
|

deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index

Abstract: The alignment of long-read RNA sequencing reads is non-trivial due to high sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass alignment approach, which constructs graph-based alignment skeletons to infer exons and uses them to generate spliced reference sequences to produce refined alignments. deSALT addresses several difficult technical issues, such as small exons and sequencing errors, which break through bottlenecks of long RNA-seq read alignment. Benchmarks demonstrat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
34
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 47 publications
(36 citation statements)
references
References 42 publications
0
34
0
Order By: Relevance
“…To this end, the de Bruijn graph has become an object of central importance in many genomic analysis tasks. While it was initially used mostly in the context of genome (and transcriptome) assembly (EULER [42], Velvet [51,52], ALLPATHS [9,30], EULER-SR [10], ABySS [46], SOAPdenovo [25,29], Trans-AByss [43], SPAdes [5], Minia [13]), it has seen increasing use in comparative genomics (Cortex [19], DISCOSNP [50], Scalpel [15], BubbZ [34]) and has also been used increasingly in the context of indexing genomic data, either from raw sequencing reads (Mantis [40,1], Vari [37], VariMerge [36], MetaGraph [20]), or from assembled reference sequences (deBGA [27], Pufferfish [2], deSALT [28]), or from both (BLight [32], Bifrost [17]). These latter applications most frequently make use of the (colored) compacted de Bruijn graph, a variant of the de Bruijn graph in which maximal non-branching paths (unitigs) are condensed into single vertices in the underlying graph structure.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…To this end, the de Bruijn graph has become an object of central importance in many genomic analysis tasks. While it was initially used mostly in the context of genome (and transcriptome) assembly (EULER [42], Velvet [51,52], ALLPATHS [9,30], EULER-SR [10], ABySS [46], SOAPdenovo [25,29], Trans-AByss [43], SPAdes [5], Minia [13]), it has seen increasing use in comparative genomics (Cortex [19], DISCOSNP [50], Scalpel [15], BubbZ [34]) and has also been used increasingly in the context of indexing genomic data, either from raw sequencing reads (Mantis [40,1], Vari [37], VariMerge [36], MetaGraph [20]), or from assembled reference sequences (deBGA [27], Pufferfish [2], deSALT [28]), or from both (BLight [32], Bifrost [17]). These latter applications most frequently make use of the (colored) compacted de Bruijn graph, a variant of the de Bruijn graph in which maximal non-branching paths (unitigs) are condensed into single vertices in the underlying graph structure.…”
Section: Introductionmentioning
confidence: 99%
“…To this end, the de Bruijn graph has become an object of central importance in many genomic analysis tasks. While it was initially used mostly in the context of genome (and transcriptome) assembly (EULER (Pevzner et al, 2001), EULER-SR (Chaisson and Pevzner, 2008), Velvet (Zerbino and Birney, 2008;Zerbino et al, 2009), ALLPATHS (Butler et al, 2008;MacCallum et al, 2009), ABySS (Simpson et al, 2009), Trans-AByss (Robertson et al, 2010), SPAdes (Bankevich et al, 2012), Minia (Chikhi and Rizk, 2013), SOAPdenovo (Li et al, 2010;Luo et al, 2015)), it has seen increasing use in i i i i i i i i comparative genomics (Cortex (Iqbal et al, 2012), DISCOSNP (Uricaru et al, 2014), Scalpel (Fang et al, 2016), BubbZ (Minkin and Medvedev, 2020)), and has also been used increasingly in the context of indexing genomic data, either from raw sequencing reads (Vari (Muggli et al, 2017), Mantis (Pandey et al, 2018;Almodaresi et al, 2019), VariMerge (Muggli et al, 2019), MetaGraph (Karasikov et al, 2020)), or from assembled reference sequences (deBGA (Liu et al, 2016), Pufferfish (Almodaresi et al, 2018), deSALT (Liu et al, 2019)), or from both (BLight (Marchet et al, 2019), Bifrost (Holley and Melsted, 2020)). These latter applications most frequently make use of the (colored) compacted de Bruijn graph, a variant of the de Bruijn graph in which the maximal non-branching paths (also referred to as unitigs) are condensed into single vertices in the underlying graph structure.…”
Section: Introductionmentioning
confidence: 99%
“…Making RNAseq read aligners aware of these sequence features (as is the case for the commonly used spliced aligners STAR (16), HISAT2 (17) and minimap2 (18)) can significantly improve the alignment of reads at splice junctions. In addition, where genome and transcriptome annotations exist, many alignment tools allow users to provide sets of correct splice junctions to guide alignment (16)(17)(18)(19). Introns containing these guide splice junctions are penalised less than novel introns, resulting in fewer alignment errors.…”
Section: Introductionmentioning
confidence: 99%
“…Two-pass alignment has also been used to improve splice junction detection and quantification (16,19,20). In a two-pass alignment approach, splice junctions detected in a first round of alignment are scored less negatively in a second round, thereby allowing information sharing between alignments.…”
Section: Introductionmentioning
confidence: 99%
“…In many cases these higher error rates can prevent the correct identification of isoforms (11)(12)(13). Although several alignment software (14)(15)(16)(17)(18) are optimized to handle these errors, their shortcomings confound transcript identification and annotation. Many reads cannot be aligned and regions where the sequencing error rates are higher such as UTRs frequently produce ambiguous alignments.…”
Section: Introductionmentioning
confidence: 99%