BackgroundVarious indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time.ResultsWe present the succinct hash index – a novel data structure for read mapping which is a variant of the classical q-gram index with a particularly small memory footprint occupying between 3.5 and 5.3 GB for a human reference genome for typical parameter settings. The succinct hash index features two novel seed selection algorithms (group seeding and variable-length seeding) and an efficient parallel construction algorithm, which we have implemented to design the FEM (Fast(F) and Efficient(E) read Mapper(M)) mapper. FEM can return all read mappings within a given edit distance. Our experimental results show that FEM is scalable and outperforms other state-of-the-art all-mappers in terms of both speed and memory footprint. Compared to Masai, FEM is an order-of-magnitude faster using a single thread and two orders-of-magnitude faster when using multiple threads. Furthermore, we observe an up to 2.8-fold speedup compared to BitMapper and an order-of-magnitude speedup compared to BitMapper2 and Hobbes3.ConclusionsThe presented succinct index is the first feasible implementation of the q-gram index functionality that occupies around 3.5 GB of memory for a whole human reference genome. FEM is freely available at https://github.com/haowenz/FEM.
In this paper we present XSW, a new parallel Smith-Waterman algorithm for searching protein sequence databases on the Xeon Phi coprocessor. In order to make full use of the compute power of the many-core Xeon Phi hardware, we have used a two-level parallelization scheme: the thread level coarse-grained and VPU level fine-grained parallelism to implement our algorithm. At the thread level, XSW employs multi-threading to implement the SIMD parallelism. At the VPU level, we have used the Knights Corner instructions to gain more data parallelism. We have also reorganized the database and made use of the parallel shuffling operations on Xeon Phi to achieve better I/O efficiency. Evaluations on real protein sequence databases show that XSW achieves the peak performance of 70 GCUPS on a single Intel Xeon Phi 7110 card. Compared to two other well parallelized Smith-Waterman algorithms: the multi-core CPU-based SWIPE and the GPU-based CUDASW++ 3.0, XSW achieves much better performance than SWIPE. And XSW achieves comparable performance but better accuracy than CUDASW++ 3.0. To our knowledge this is the first reported implementation of the Smith-Waterman algorithm on Xeon Phi. The executable binary code of XSW is available at http://sdu-hpcl.github.io/XSW/.
Motivation Modern bioinformatics tools for analyzing large-scale NGS datasets often need to include fast implementations of core sequence alignment algorithms in order to achieve reasonable execution times. We address this need by presenting the BGSA toolkit for optimized implementations of popular bit-parallel global pairwise alignment algorithms on modern microprocessors. Results BGSA outperforms Edlib, SeqAn and BitPAl for pairwise edit distance computations and Parasail, SeqAn and BitPAl when using more general scoring schemes for pairwise alignments of a batch of sequence reads on both standard multi-core CPUs and Xeon Phi many-core CPUs. Furthermore, banded edit distance performance of BGSA on a Xeon Phi-7210 outperforms the highly optimized NVBio implementation on a Titan X GPU for the seed verification stage of a read mapper by a factor of 4.4. Availability and implementation BGSA is open-source and available at https://github.com/sdu-hpcl/BGSA. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.