We anticipate that SeqAn will continue to be a valuable resource, especially since it started to actively support various hardware acceleration techniques in a systematic manner.
The unidirectional FM index was introduced by Ferragina and Manzini in 2000 and allows to search a pattern in the index in one direction. The bidirectional FM index (2FM) was introduced by Lam et al. in 2009. It allows to search for a pattern by extending an infix of the pattern arbitrarily to the left or right. The method of Lam et al. can conduct one step in time O(σ) while needing space O(σ · n) using constant time rank queries on bit vectors. Schnattinger and colleagues improved this time to O(log σ) while usingO(log σ · n) bits of space for both, the FM and 2FM index. This is achieved by the use of binary wavelet trees. In this paper we introduce a new, practical method for conducting an exact search in a uni-and bidirectional FM index in O(1) time per step while using O(log σ · n) + o(log σ · σ · n) bits of space. This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary). We implemented this method in the SeqAn C++ library and experimentally validated our theoretical results. In addition we compared our implementation with other freely available implementations of bidirectional indices and show that we are between ≈ 2.6 − 4.8 times faster. This will have a large impact for many bioinformatics applications that rely on practical implementations of (2)FM indices e.g. for read mapping. To our knowledge this is the first implementation of a constant time method for a search step in 2FM indices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.