Design guidelines for high-performance SCM hierarchies

Ustiugov, Dmitrii; Daglis, Alexandros; Picorel, Javier; Sutherland, Mark; Bugnion, Edouard; Falsafi, Babak; Pnevmatikatos, Dionisios

doi:10.1145/3240302.3240310

Cited by 12 publications

(11 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, such a scheme suffers from long latencies and low throughput due to high overhead from the software stack. Prior work such as [18,20,45] has tackled this high overhead by enabling cache line access to the SSD [18,20], using host DRAM as a cache [45], merging multiple translation layers into one layer, and promoting pages from SSD when locality is detected [18]. Due to its benefits, Flash memory is abundantly available, extensively used, and continues to be optimized to provide better performance [13,22,35].…”

Section: Flash As Memorymentioning

confidence: 99%

Tearing Down the Memory Wall

Qureshi,

Mailthody,

Min

et al. 2020

Preprint

View full text Add to dashboard Cite

We present a vision for the Erudite architecture that redefines the compute and memory abstractions such that memory bandwidth and capacity become first-class citizens along with compute throughput. In this architecture, we envision coupling a high-density, massively parallel memory technology like Flash with programmable near-data accelerators, like the streaming multiprocessors in modern GPUs. Each accelerator has a local pool of storage-class memory that it can access at high throughput by initiating very large numbers of overlapping request that help to tolerate long access latency. The accelerators can also communicate with each other and remote memory through a high-throughput low-latency interconnect. As a result, systems based on the Erudite architecture scale compute and memory bandwidth at the same rate, tearing down the notorious memory wall that has plagued computer architecture for generations. In this paper, we present the motivation, rationale, design, benefit, and research challenges for Erudite.

show abstract

Section: Flash As Memorymentioning

confidence: 99%

Tearing Down the Memory Wall

Qureshi,

Mailthody,

Min

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…First, many modern online services have msscale end-to-end tail-latency constraints [18], [23], which allows them to absorb few µs-scale flash accesses [14], [18], [22], [41], [42], [44]. Second, object popularity and request distributions for datacenter workloads are inherently skewed [64], [73], [75], [76], thus allowing hosting the hot fraction of the dataset in DRAM that serves most requests and filters the bandwidth required from the backing flash. The above observations should permit the design of a cost-effective two-tier hierarchy where a capacity-constrained DRAM caches the hot fraction of the dataset stored in a capacity-scaled flash layer.…”

Section: Introductionmentioning

confidence: 99%

AstriFlash A Flash-Based System for Online Services

Gupta¹,

Oh²,

Liu³

et al. 2023

2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Self Cite

View full text Add to dashboard Cite

Modern datacenters host datasets in DRAM to offer large-scale online services with tight tail-latency requirements. Unfortunately, as DRAM is expensive and increasingly difficult to scale, datacenter operators are forced to consider denser storage technologies. While modern flash-based storage exhibits µs-scale access latency, which is well within the tail-latency constraints of many online services, traditional demand paging abstraction used to manage memory and storage incurs high overheads and prohibits flash usage in online services. We introduce AstriFlash, a hardware-software co-design that tightly integrates flash and DRAM with ns-scale overheads. Our evaluation of server workloads with cycle-accurate full-system simulation shows that AstriFlash achieves 95% of a DRAM-only system's throughput while maintaining the required 99th-percentile tail latency and reducing the memory cost by 20x.

show abstract

“…This approach allows for the memory to be scaled out independently of the processing resources, because scaling up the memory within a node is becoming increasingly prohibitive [108]. This "share something" approach for remote memory targets some of the challenges that are very relevant in today's data centers, some of which that are:…”

Section: Memory Management Across Virtualized Nodesmentioning

confidence: 99%

“…When memory is disaggregated and becomes a pooled resource, it is possible to use different types of memory technologies besides the traditional DRAM, such as Storage-Class Memories (SCMs) [108], a term that refers generally to Non-Volatile Memories (NVMs) [77]. NVMs provide a good option for pooled disaggregated memory because it is able to provide higher density and higher capacity than traditional DRAMs, but at a much lower cost [28].…”

Section: The Memory Management Problem With Virtualization Andmentioning

confidence: 99%

“…NVMs provide a good option for pooled disaggregated memory because it is able to provide higher density and higher capacity than traditional DRAMs, but at a much lower cost [28]. However, NVMs have higher latency than DRAM (but still less latency than Flash or disk devices), less bandwidth and has endurance limitations [108,126,28]. NVMs can be used as an additional level in the memory hierarchy functioning as a cache between DRAM and permanent storage (even in cases when it is not pooled) [108,110], as block-device for swapped pages [126,40], or for increasing system resiliency through data replication [122] or as back-up for system memory [96].…”

Section: The Memory Management Problem With Virtualization Andmentioning

confidence: 99%

See 1 more Smart Citation

Virtualization techniques for memory resource exploitation

Platero¹

View full text Add to dashboard Cite

Cloud infrastructures have become indispensable in our daily lives with the rise of cloud-based services offered by companies like Facebook, Google, Amazon and many others. These cloud infrastructures use a large numbers of servers provisioned with their own computing resources. Each of these servers use a piece of software, called the Hypervisor (``HV''), that allows them to create multiple virtual instances of the server's physical computing resources and abstract them into "Virtual Machines'' (VMs). A VM runs an Operating System, which in turn runs the applications. The VMs within the servers generate varying memory demand behavior. When the demand increases, costly operations such as (virtual) disk accesses and/or VM migrations can occur. As a result, it is necessary to optimize the utilization of the local memory resources within a single computing server. However, pressure on the memory resources can still increase, making it necessary to migrate the VM to a different server with larger memory or add more memory to the same server. At this point, it is important to consider that some of the servers in the cloud infrastructure might have memory resources that they are not using. Considering the possibility to make memory available to the server, new architectures have been introduced that provide hardware support to enable servers to share their memory capacity. This thesis presents multiple contributions to the memory management problem. First, it addresses the problem of optimizing memory resources in a virtualized server through different types of memory abstractions. Two full contributions are presented for managing memory within a single server called SmarTmem and CARLEMM. In this respect, a third contribution is also presented, called CAVMem, that works as the foundation for CARLEMM. Second, this thesis presents two contributions for memory capacity aggregation across multiple servers, offering two mechanisms called GV-Tmem and vMCA, this latter being based on GV-Tmem but with significant enhancements. These mechanisms distribute the server's total memory within a single-server and globally across computing servers using a user-space process with high-level memory management policies. Las infraestructuras para la nube se han vuelto indispensables en nuestras vidas diarias con la proliferación de los servicios ofrecidos por compañías como Facebook, Google, Amazon entre otras. Estas infraestructuras utilizan una gran cantidad de servidores proveídos con sus propios recursos computacionales. Cada unos de estos servidores utilizan un software, llamado el Hipervisor (“HV”), que les permite crear múltiples instancias virtuales de los recursos físicos de computación del servidor y abstraerlos en “Máquinas Virtuales” (VMs). Una VM ejecuta un Sistema Operativo (OS), el cual a su vez ejecuta aplicaciones. Las VMs dentro de los servidores generan un comportamiento variable de demanda de memoria. Cuando la demanda de memoria aumenta, operaciones costosas como accesos al disco (virtual) y/o migraciones de VMs pueden ocurrir. Como resultado, es necesario optimizar la utilización de los recursos de memoria locales dentro del servidor. Sin embargo, la demanda por memoria puede seguir aumentando, haciendo necesario que la VM migre a otro servidor o que se añada más memoria al servidor. En este punto, es importante considerar que algunos servidores podrían tener recursos de memoria que no están utilizando. Considerando la posibilidad de hacer más memoria disponible a los servidores que lo necesitan, nuevas arquitecturas de servidores han sido introducidos que brindan el soporte de hardware necesario para habilitar que los servidores puedan compartir su capacidad de memoria. Esta tesis presenta múltiples contribuciones para el problema de manejo de memoria. Primero, se enfoca en el problema de optimizar los recursos de memoria en un servidor virtualizado a través de distintos tipos de abstracciones de memoria. Dos contribuciones son presentadas para administrar memoria de manera automática dentro de un servidor virtualizado, llamadas SmarTmem y CARLEMM. En este contexto, una tercera contribución es presentada, llamada CAVMem, que proporciona los fundamentos para el desarrollo de CARLEMM. Segundo, la tesis presenta dos contribuciones enfocadas en la agregación de capacidad de memoria a través de múltiples servidores, ofreciendo dos mecanismos llamados GV-Tmem y vMCA, siendo este último basado en GV-Tmem pero con mejoras significativas. Estos mecanismos administran la memoria total de un servidor a nivel local y de manera global a lo largo de los servidores de la infraestructura de nube utilizando un proceso de usuario que implementa políticas de manejo de ...

show abstract

Design guidelines for high-performance SCM hierarchies

Cited by 12 publications

References 65 publications

Tearing Down the Memory Wall

Tearing Down the Memory Wall

AstriFlash A Flash-Based System for Online Services

Virtualization techniques for memory resource exploitation

Contact Info

Product

Resources

About