Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of the two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10 4 ) tasks; and (iv) task-level fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing.
CoCo (“complementary coordinates”) is a method for ensemble enrichment based on principal component analysis (PCA) that was developed originally for the investigation of NMR data. Here we investigate the potential of the CoCo method, in combination with molecular dynamics simulations (CoCo-MD), to be used more generally for the enhanced sampling of conformational space. Using the alanine penta-peptide as a model system, we find that an iterative workflow, interleaving short multiple-walker MD simulations with long-range jumps through conformational space informed by CoCo analysis, can increase the rate of sampling of conformational space up to 10 times for the same computational effort (total number of MD timesteps). Combined with the reservoir-REMD method, free energies can be readily calculated. An alternative, approximate but fast and practically useful, alternative approach to unbiasing CoCo-MD generated data is also described. Applied to cyclosporine A, we can achieve far greater conformational sampling than has been reported previously, using a fraction of the computational resource. Simulations of the maltose binding protein, begun from the “open” state, effectively sample the “closed” conformation associated with ligand binding. The PCA-based approach means that optimal collective variables to enhance sampling need not be defined in advance by the user but are identified automatically and are adaptive, responding to the characteristics of the developing ensemble. In addition, the approach does not require any adaptations to the associated MD code and is compatible with any conventional MD package.
Recent advances in both theory and methods have created opportunities to simulate biomolecular processes more efficiently using adaptive ensemble simulations. Ensemble-based simulations are used widely to compute a number of individual simulation trajectories and analyze statistics across them. Adaptive ensemble simulations offer a further level of sophistication and flexibility by enabling high-level algorithms to control simulations based on intermediate results. Novel high-level algorithms require sophisticated approaches to utilize the intermediate data during runtime. Thus, there is a need for scalable software systems to support adaptive ensemble-based applications. We describe the operations in executing adaptive workflows, classify different types of adaptations, and describe challenges in implementing them in software tools. We enhance Ensemble Toolkit (EnTK) -an ensemble execution system -to support the scalable execution of adaptive workflows on HPC systems, and characterize the adaptation overhead in EnTK. We implement two high-level adaptive ensemble algorithms -expanded ensemble and Markov state modeling, and execute upto 2 12 ensemble members, on thousands of cores on three distinct HPC platforms. We highlight scientific advantages enabled by the novel capabilities of our approach. To the best of our knowledge, this is the first attempt at describing and implementing multiple adaptive ensemble workflows using a common conceptual and implementation framework.
This paper describes a building blocks approach to the design of scientific workflow systems. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they are designed and developed in accordance with this approach. This paper offers three main contributions: (i) showing the relevance of the design principles underlying the building blocks approach to support scientific workflows on high performance computing platforms; (ii) illustrating a set of building blocks that enable multiple points of integration, "unifying" conceptual reasoning across otherwise very different tools and systems; and (iii) case studies discussing how RADICAL-Cybertools are integrated with existing workflow, workload, and general purpose computing systems and used to develop domainspecific workflow systems.
BackgroundResistance to chemotherapy and molecularly targeted therapies is a major factor in limiting the effectiveness of cancer treatment. In many cases, resistance can be linked to genetic changes in target proteins, either pre-existing or evolutionarily selected during treatment. Key to overcoming this challenge is an understanding of the molecular determinants of drug binding. Using multi-stage pipelines of molecular simulations we can gain insights into the binding free energy and the residence time of a ligand, which can inform both stratified and personal treatment regimes and drug development. To support the scalable, adaptive and automated calculation of the binding free energy on high-performance computing resources, we introduce the High-throughput Binding Affinity Calculator (HTBAC). HTBAC uses a building block approach in order to attain both workflow flexibility and performance.ResultsWe demonstrate close to perfect weak scaling to hundreds of concurrent multi-stage binding affinity calculation pipelines. This permits a rapid time-to-solution that is essentially invariant of the calculation protocol, size of candidate ligands and number of ensemble simulations.ConclusionsAs such, HTBAC advances the state of the art of binding affinity calculations and protocols. HTBAC provides the platform to enable scientists to study a wide range of cancer drugs and candidate ligands in order to support personalized clinical decision making based on genome sequencing and drug discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.