Structurally elucidating transition pathways between protein conformations gives deep mechanistic insight into protein behavior but is typically difficult. Unbiased molecular dynamics (MD) simulations provide one solution, but their computational expense is often prohibitive, motivating the development of enhanced sampling methods that accelerate conformational changes in a given direction, embodied in a collective variable. The accuracy of such methods is unclear for complex protein transitions, because obtaining unbiased MD data for comparison is difficult. Here, we use long-time scale, unbiased MD simulations of epidermal growth factor receptor kinase deactivation as a complex biological test case for two widely used methods-steered molecular dynamics (SMD) and the string method. We found that common collective variable choices, based on the root-mean-square deviation (RMSD) of the entire protein, prevented the methods from producing accurate paths, even in SMD simulations on the time scale of the unbiased transition. Using collective variables based on the RMSD of the region of the protein known to be important for the conformational change, however, enabled both methods to provide a more accurate description of the pathway in a fraction of the simulation time required to observe the unbiased transition.
Abstract-Special-purpose computing hardware can provide significantly better performance and power efficiency for certain applications than general-purpose processors. Even within a single application area, however, a special-purpose machine can be far more valuable if it is capable of efficiently supporting a number of different computational methods that, taken together, expand the machine's functionality and range of applicability. We have previously described a massively parallel special-purpose supercomputer, called Anton, and have shown that it executes traditional molecular dynamics simulations orders of magnitude faster than the previous state of the art. Here, we describe how we extended Anton's software to support a more diverse set of methods, allowing scientists to simulate a broader class of biological phenomena at extremely high speeds. Key elements of our approach, which exploits Anton's tightly integrated hardwired pipelines and programmable cores, are applicable to the hardware and software design of various other specialized or heterogeneous parallel computing platforms.
Our digital universe is growing, creating exploding amounts of data which need to be searched, protected and filtered. String searching is at the core of the tools we use to curb this explosion, such as search engines, network intrusion detection systems, spam filters, and anti-virus programs. But as communication speed grows, our capability to perform string searching in real-time seems to fall behind. Multi-core architectures promise enough computational power to cope with the incoming challenge, but it is still unclear which algorithms and programming models to use to unleash this power.We have parallelized a popular string searching algorithm, Aho-Corasick, on the IBM Cell/B.E. processor, with the goal of performing exact string matching against large dictionaries. In this article we propose a novel approach to fully exploit the DMA-based communication mechanisms of the Cell/B.E. to provide an unprecedented level of aggregate performance with irregular access patterns.We have discovered that memory congestion plays a crucial role in determining the performance of this algorithm. We discuss three aspects of congestion: memory pressure, layout issues and hot spots, and we present a collection of algorithmic solutions to alleviate these problems and achieve quasi-optimal performance.The implementation of our algorithm provides a worst-case throughput of 2.5 Gbps, and a typical throughput between 3.3 and 4.4 Gbps, measured on realistic scenarios with a twoprocessor Cell/B.E. system.
Reverse-Time Migration (RTM) is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other hand, despite multi-core architectures promise to deliver unprecedented computational power, little attention has been devoted to mapping efficiently RTM to multi-cores. In this paper, we present a mapping of the RTM computational kernel to the IBM Cell/B.E. processor that reaches close-to-optimal performance. The kernel proves to be memory-bound and it achieves a 98% utilization of the peak memory bandwidth. Our Cell/B.E. implementation outperforms a traditional processor (PowerPC 970MP) in terms of performance (with an 15.0× speedup) and energy-efficiency (with a 10.0× increase in the GFlops/W delivered). Also, it is the fastest RTM implementation available to the best of our knowledge. These results increase the practical usability of RTM. Also, the RTM-Cell/B.E. combination proves to be a strong competitor in the seismic arena.
Matching regular expressions (regexps) is a very common workload. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30% or more of most XML processors' execution time and represents the first stage of any programming language compiler.Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer.We present an algorithm and a set of techniques for using multicore features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization.As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E. processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2% of the clock cycles are non-NOP issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell.Our approach is valuable because it is easily portable to other SIMD-enabled processors, and there is a general trend toward more and wider SIMD instructions in architecture design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.