“…Saying it another way, although we put little effort into choosing configurations that could achieve cost savings, we still found these cost savings occurred 20% of the time. If we put more effort into choosing such configurations, perhaps by incorporating the work of Malakar [12,13], who had complementary ideas on choosing allocation sizes and analysis frequencies, this proportion could rise significantly. A twin benefit to choosing an appropriately sized in-transit allocation is that potentially more nodes would be available for simulation use, as over allocating an in-transit allocation can limit the maximum size of a simulation scaling run.…”
Section: Discussionmentioning
confidence: 99%
“…As such, their findings differ from ours. -Malakar et al did twin studies on cost models, one for in-line [12] and one for in-transit [13]. Once again, these studies did not consider V CEF .…”
We analyze the opportunities for in-transit visualization to provide cost savings compared to in-line visualization. We begin by developing a cost model that includes factors related to both in-line and intransit which allows comparisons to be made between the two methods. We then run a series of studies to create a corpus of data for our model. We run two different visualization algorithms, one that is computation heavy and one that is communication heavy with concurrencies up to 32, 768 cores. Our primary results are in exploring the cost model within the context of our corpus. Our findings show that in-transit consistently achieves significant cost efficiencies by running visualization algorithms at lower concurrency, and that in many cases these efficiencies are enough to offset other costs (transfer, blocking, and additional nodes) to be cost effective overall. Finally, this work informs future studies, which can focus on choosing ideal configurations for in-transit processing that can consistently achieve cost efficiencies.
“…Saying it another way, although we put little effort into choosing configurations that could achieve cost savings, we still found these cost savings occurred 20% of the time. If we put more effort into choosing such configurations, perhaps by incorporating the work of Malakar [12,13], who had complementary ideas on choosing allocation sizes and analysis frequencies, this proportion could rise significantly. A twin benefit to choosing an appropriately sized in-transit allocation is that potentially more nodes would be available for simulation use, as over allocating an in-transit allocation can limit the maximum size of a simulation scaling run.…”
Section: Discussionmentioning
confidence: 99%
“…As such, their findings differ from ours. -Malakar et al did twin studies on cost models, one for in-line [12] and one for in-transit [13]. Once again, these studies did not consider V CEF .…”
We analyze the opportunities for in-transit visualization to provide cost savings compared to in-line visualization. We begin by developing a cost model that includes factors related to both in-line and intransit which allows comparisons to be made between the two methods. We then run a series of studies to create a corpus of data for our model. We run two different visualization algorithms, one that is computation heavy and one that is communication heavy with concurrencies up to 32, 768 cores. Our primary results are in exploring the cost model within the context of our corpus. Our findings show that in-transit consistently achieves significant cost efficiencies by running visualization algorithms at lower concurrency, and that in many cases these efficiencies are enough to offset other costs (transfer, blocking, and additional nodes) to be cost effective overall. Finally, this work informs future studies, which can focus on choosing ideal configurations for in-transit processing that can consistently achieve cost efficiencies.
“…Should certain components be placed on a single node (simplifying implementation, but limiting performance) or on multiple nodes? Different choices may make different intra- and intercomponent communication mechanisms available, each with different performance characteristics (Choi et al, 2018; Malakar et al 2015, 2016, 2018).…”
Section: Perspectives On Odar and Co-designmentioning
A growing disparity between supercomputer computation speeds and I/O rates means that it is rapidly becoming infeasible to analyze supercomputer application output only after that output has been written to a file system. Instead, data-generating applications must run concurrently with data reduction and/or analysis operations, with which they exchange information via high-speed methods such as interprocess communications. The resulting parallel computing motif, online data analysis and reduction (ODAR), has important implications for both application and HPC systems design. Here we introduce the ODAR motif and its co-design concerns, describe a co-design process for identifying and addressing those concerns, present tools that assist in the co-design process, and present case studies to illustrate the use of the process and tools in practical settings.
“…Different approaches have been proposed in the literature to ascertain the performance gains brought by an in-situ (or in-transit) execution of a given scientific workflow application and determine the best configuration deployment of its components on a given target platform. We distinguish these approaches depending on whether they rely on actual experiments [3][4][5][6][7] or resort to simulation [8][9][10] to evaluate the performance of in-situ workflows. The former is intrinsically time-and resource-consuming while the latter may suffer from simplification biases when the abstract versions of the in-situ workflow components are developed.…”
The amount of data generated by numerical simulations in various scientific domains such as molecular dynamics, climate modeling, biology, or astrophysics, led to a fundamental redesign of application workflows. The throughput and the capacity of storage subsystems have not evolved as fast as the computing power in extreme-scale supercomputers. As a result, the classical posthoc analysis of simulation outputs became highly inefficient. In-situ workflows have then emerged as a solution in which simulation and data analytics are intertwined through shared computing resources, thus lower latencies.Determining the best allocation, i.e., how many resources to allocate to each component of an in-situ workflow; and mapping, i.e., where and at which frequency to run the data analytics component, is a complex task whose performance assessment is crucial to the efficient execution of in-situ workflows. However, such a performance evaluation of different allocation and mapping strategies usually relies either on directly running them on the targeted execution environments, which can rapidly become extremely time-and resource-consuming, or on resorting to the simulation of simplified models of the components of an in-situ workflow, which can lack of realism. In both cases, the validity of the performance evaluation is limited.To address this issue, we introduce Sim-Situ, a framework for the faithful simulation of insitu workflows. This framework builds on the SimGrid toolkit and benefits of several important features of this versatile simulation tool. We designed Sim-Situ to reflect the typical structure of in-situ workflows and thanks to its modular design, Sim-Situ has the necessary flexibility to easily and faithfully evaluate the behavior and performance of various allocation and mapping strategies for in-situ workflows. We illustrate the simulation capabilities of Sim-Situ on a Molecular Dynamics use case. We study the impact of different allocation and mapping strategies on performance and show how users can leverage Sim-Situ to determine interesting tradeoffs when designing their in-situ workflow.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.