HMMER, based on the profile Hidden Markov Model (HMM) is one of the most widely used sequence database searching tools, allowing researchers to compare HMMs to sequence databases or sequences to HMM databases. Such searches often take many hours and consume a great number of CPU cycles on modern computers. We present a cluster-enabled hardware/software-accelerated implementation of the HMMER search tool hmmsearch. Our results show that combining the parallel efficiency of a cluster with one or more high-speed hardware accelerators (FPGAs) can significantly improve performance for even the most time consuming searches, often reducing search times from several hours to minutes.
Protein sequence analysis tools to predict homology, structure and function of particular peptide sequences exist in abundance. One of the most commonly used tools is the profile hidden Markov model algorithm developed by Eddy and coworkers [Durbin et al., 1998]. These tools allow scientists to construct mathematical models (Hidden Markov Models or HMM) of a set of aligned protein sequences with known similar function and homology, which is then applicable to a large database of proteins. The tools provide the ability to generate a log-odds score as to whether or not the protein belongs to the same family as the proteins which generated the HMM, or to a set of random unrelated sequences.Due to the complexity of the calculation, and the possibility to apply many HMM's to a single sequence (pfam search), these calculations require significant numbers of processing cycles. Efforts to accelerate these searches have resulted in several platform and hardware specific variants including an Altivec port by Lindahl [Lindahl, 2005], a GPU port of hmmsearch by Horn et al. of Stanford [Horn et al., 2005] as well as several optimizations performed by the authors of this chapter. These optimizations span a range between minimal source code changes with some impact upon performance, to recasting the core algorithms in terms of a different computing technology and thus fundamentally altering the calculation. Each approach has specific benefits and costs. Detailed descriptions of the author's modifications can also be found in [Walters et al., 2006.The remainder of this chapter is organized as follows: in section 1.2 we give a brief overview of HMMER and the underlying plan-7 architecture. In section 1.3 we discuss several different strategies that have been used to implement and accelerate HMMER on a variety of platforms. In section 1.4 we detail our optimizations and provide performance details. We conclude this chapter in section 1.5
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.