Priyank Faldu scite author profile

Graph analytics power a range of applications in areas as diverse as finance, networking and business logistics. A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections in the graph. These richly-connected (hot) vertices inherently exhibit high reuse. However, their sparse distribution in memory leads to a severe underutilization of onchip cache capacity. Prior works have proposed lightweight skewaware vertex reordering that places hot vertices adjacent to each other in memory, reducing the cache footprint of hot vertices and thus improving cache efficiency. However, in doing so, they may inadvertently destroy the inherent community structure within the graph, which may negate the performance gains achieved from the reduced footprint of hot vertices.In this work, we study existing reordering techniques and demonstrate the inherent tension between reducing the cache footprint of hot vertices and preserving original graph structure. We quantify the potential performance loss due to disruption in graph structure for different graph datasets. We further show that reordering techniques that employ fine-grain reordering significantly increase misses in the higher level caches, even when they reduce misses in the last level cache.To overcome the limitations of existing reordering techniques, we propose Degree-Based Grouping (DBG), a novel lightweight reordering technique that employs a coarse-grain reordering to largely preserve graph structure while reducing the cache footprint of hot vertices. Our evaluation on 40 combinations of various graph applications and datasets shows that, compared to a baseline with no reordering, DBG yields an average application speed-up of 16.8% vs 11.6% for the best-performing existing lightweight technique.

show abstract

Leeway: Addressing Variability in Dead-Block Prediction for Last-Level Caches

Faldu¹,

Grot²

2017

View full text Add to dashboard Cite

Abstract-The looming breakdown of Moore's Law and the end of voltage scaling are ushering a new era where neither transistors nor the energy to operate them is free. This calls for a new regime in computer systems, one in which every transistor counts. Caches are essential for processor performance and represent the bulk of modern processor's transistor budget. To get more performance out of the cache hierarchy, future processors will rely on effective cache management policies.This paper identifies variability in generational behavior of cache blocks as a key challenge for cache management policies that aim to identify dead blocks as early and as accurately as possible to maximize cache efficiency. We show that existing management policies are limited by the metrics they use to identify dead blocks, leading to low coverage and/or low accuracy in the face of variability. In response, we introduce a new metric -Live Distance -that uses the stack distance to learn the temporal reuse characteristics of cache blocks, thus enabling a dead block predictor that is robust to variability in generational behavior. Based on the reuse characteristics of an application's cache blocks, our predictor -Leeway -classifies application's behavior as streaming-oriented or reuse-oriented and dynamically selects an appropriate cache management policy. By leveraging live distance for LLC management, Leeway outperforms state-of-the-art approaches on single-and multi-core SPEC and manycore CloudSuite workloads.

show abstract

Reconsidering OS memory optimizations in the presence of disaggregated memory

Bergman

Faldu

Grot

et al. 2022

View full text Add to dashboard Cite

Tiered memory systems introduce an additional memory level with higher-than-local-DRAM access latency and require sophisticated memory management mechanisms to achieve cost-efficiency and high performance. Recent works focus on byte-addressable tiered memory architectures which offer better performance than pure swap-based systems. We observe that adding disaggregation to a byte-addressable tiered memory architecture requires important design changes that deviate from the common techniques that target lowerlatency non-volatile memory systems. Our comprehensive analysis of real workloads shows that the high access latency to disaggregated memory undermines the utility of well-established memory management optimizations. Based on these insights, we develop HotBox -a disaggregated memory management subsystem for Linux that strives to maximize the local memory hit rate with low memory management overhead. HotBox introduces only minor changes to the Linux kernel while outperforming state-of-the-art systems on memory-intensive benchmarks by up to 2.25×. CCS Concepts:• Software and its engineering → Memory management; • Hardware → Emerging architectures.

show abstract

Domain-Specialized Cache Management for Graph Analytics

Faldu

Diamond

Grot

2020

View full text Add to dashboard Cite

Graph analytics power a range of applications in areas as diverse as finance, networking and business logistics. A common property of graphs used in the domain of graph analytics is a power-law distribution of vertex connectivity, wherein a small number of vertices are responsible for a high fraction of all connections in the graph. These richly-connected, hot, vertices inherently exhibit high reuse. However, this work finds that state-of-the-art hardware cache management schemes struggle in capitalizing on their reuse due to highly irregular access patterns of graph analytics.In response, we propose GRASP, domain-specialized cache management at the last-level cache for graph analytics. GRASP augments existing cache policies to maximize reuse of hot vertices by protecting them against cache thrashing, while maintaining sufficient flexibility to capture the reuse of other vertices as needed. GRASP keeps hardware cost negligible by leveraging lightweight software support to pinpoint hot vertices, thus eliding the need for storage-intensive prediction mechanisms employed by state-of-the-art cache management schemes. On a set of diverse graph-analytic applications with large high-skew graph datasets, GRASP outperforms prior domain-agnostic schemes on all datapoints, yielding an average speed-up of 4.2% (max 9.4%) over the best-performing prior scheme. GRASP remains robust on low-/no-skew datasets, whereas prior schemes consistently cause a slowdown.

show abstract

POSTER: Domain-Specialized Cache Management for Graph Analytics

Faldu

Diamond

Grot

2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Priyank Faldu

A Closer Look at Lightweight Graph Reordering

Leeway: Addressing Variability in Dead-Block Prediction for Last-Level Caches

Reconsidering OS memory optimizations in the presence of disaggregated memory

Domain-Specialized Cache Management for Graph Analytics

POSTER: Domain-Specialized Cache Management for Graph Analytics

Contact Info

Product

Resources

About