The emergence of non-volatile memory DIMMs such as Intel Optane DCPMM blurs the gap between usual volatile memory and persistent storage by enabling byte-accessible persistent memory with reasonable performance. This new hardware supports many possible use cases for high-performance applications, from high performance storage to very-high-capacity volatile memory (terabytes). However the numerous ways to configure the memory subsystem raises the question of how to configure nodes to satisfy applications' needs (memory, storage, fault tolerance, etc.). We focus on the issue of partitioning HPC nodes with NVDIMMs in the context of coscheduling multiple jobs. We show that the basic NVDIMM configuration modes would require node reboots and expensive hardware configuration. Moreover it does not allow the co-scheduling of all kinds of jobs, and it does not always allow locality to be taken into account during resource allocation. Then we show that using 1-Level-Memory and the Device DAX mode by default is a good compromise. It may be easily used and partitioned for storage and memory-bound applications with locality awareness.
The complexity of the memory system has increased dramatically in the last decade. As a result, high-performance computers include multi-level, heterogeneous, and non-uniform memories, each with significantly different properties. For example, a memory system nowadays may include three types of memory: low-latency memory (DDR), high-bandwidth memory (HBM), and high-capacity memory (NVM)-not to mention multiple NUMA domains. Because of their significantly different characteristics and number, scientific application developers face a tremendous challenge: Leverage the memory system effectively to improve performance and productivity.In this work, we present M&MMs, an interface to help manage the memory system complexity. It is comprised of a set of memory attributes and an API to express and manage the diverse memory characteristics using high-level metrics that are easy to understand. Our goal is to establish a building block to enable next-generation runtime systems, computing libraries, and scientific applications to leverage the best performance attributes of each memory, e.g., leverage the bandwidth of the fastest memory with the capacity of the largest memory. We believe M&MMs is a natural extension of hwloc-that focuses on the memory system-since hwloc exposes the locality of the hardware resources and it is the de facto standard for hardware topology discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.