Linear-time String Indexing and Analysis in Small Space

Belazzougui, Djamal; Cunial, Fabio; Kärkkäinen, Juha; Mäkinen, Veli

doi:10.1145/3381417

Cited by 26 publications

(75 citation statements)

References 71 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…iGenomics uses a version of QuickSort, a divide-and-conquer sorting algorithm, because on average it takes O( n log n ) time for n objects to be sorted. Although there are now some more efficient BWT construction algorithms [ 31 ], given that iGenomics is targeted towards relatively small genomes (<100,000 bp), the amount of time for BWT sorting is negligible compared to the time to align the reads. Finally, to obtain the BWT from the sorted array, the final character of each row in the matrix is copied into a string with the first character copied having the first position, the second character copied having the second position, and so forth.…”

Section: Methodsmentioning

confidence: 99%

iGenomics: Comprehensive DNA sequence analysis on your Smartphone

et al. 2020

View full text Add to dashboard Cite

Background Following the miniaturization of integrated circuitry and other computer hardware over the past several decades, DNA sequencing is on a similar path. Leading this trend is the Oxford Nanopore sequencing platform, which currently offers the hand-held MinION instrument and even smaller instruments on the horizon. This technology has been used in several important applications, including the analysis of genomes of major pathogens in remote stations around the world. However, despite the simplicity of the sequencer, an equally simple and portable analysis platform is not yet available. Results iGenomics is the first comprehensive mobile genome analysis application, with capabilities to align reads, call variants, and visualize the results entirely on an iOS device. Implemented in Objective-C using the FM-index, banded dynamic programming, and other high-performance bioinformatics techniques, iGenomics is optimized to run in a mobile environment. We benchmark iGenomics using a variety of real and simulated Nanopore sequencing datasets of viral and bacterial genomes and show that iGenomics has performance comparable to the popular BWA-MEM/SAMtools/IGV suite, without necessitating a laptop or server cluster. Conclusions iGenomics is available open source (https://github.com/stuckinaboot/iGenomics) and for free on Apple's App Store (https://apple.co/2HCplzr).

show abstract

Section: Methodsmentioning

confidence: 99%

iGenomics: Comprehensive DNA sequence analysis on your Smartphone

et al. 2020

View full text Add to dashboard Cite

show abstract

“…We will also propose a heuristic version of the algorithm that solves a relaxed variant of Problem 1 in linear-time O ( n ). All these complexities are on top of the FMD-index construction [25], which in our case can be done in O ( m ) time and space [5].…”

Section: Problem Definitionmentioning

confidence: 99%

Comparative genome analysis using sample-specific string detection in accurate long reads

Khorsand

Denti

Bonizzoni

et al. 2021

Preprint

View full text Add to dashboard Cite

Motivation: Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include discovery of genomic differences segregating in population, case-control analysis in common diseases, and rare disorders. With the current progress of accurate long-read sequencing technologies (e.g., circular consensus sequencing from PacBio sequencers) we can dive into studying repeat regions of genome (e.g., segmental duplications) and hard-to-detect variants (e.g., complex structural variants). Results: We propose a novel framework for addressing the comparative genome analysis by discovery of strings that are specific to one genome ("samples-specific" strings). We have developed an accurate and efficient novel method for discovery of samples-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome. We show that the proposed approach is capable of accurately finding samples-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g., PacBio HiFi data). Availability: The proposed tool is publicly available at https://github.com/Parsoa/PingPong.

show abstract

“…time as they are sequenced [39]. As long read sequencing replaces the current technology, the advantages of TGA will increase further, because indexing reads will be quicker [40] and the number of blueprint update steps will decrease.…”

Section: Blueprint Genomementioning

confidence: 99%