Caroline Trippel scite author profile

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters requiring large memory capacities. Unfortunately, large and fast DRAM-based memories levy high infrastructure costs. Conventional SSD-based storage solutions offer an order of magnitude larger capacity, but have worse read latency and bandwidth, degrading inference performance. RecSSD is a near data processing based SSD memory system customized for neural recommendation inference that reduces end-to-end model inference latency by 2× compared to using COTS SSDs across eight industry-representative models. CCS CONCEPTS• Hardware → External storage; • Computer systems organization → Neural networks.

show abstract

RecShard: statistical feature-based memory optimization for industry-scale neural recommendation

Sethi

Acun²,

Agarwal³

et al. 2022

View full text Add to dashboard Cite

We propose RecShard, a fine-grained embedding table (EMB) partitioning and placement technique for deep learning recommendation models (DLRMs). RecShard is designed based on two key observations. First, not all EMBs are equal, nor all rows within an EMB are equal in terms of access patterns. EMBs exhibit distinct memory characteristics, providing performance optimization opportunities for intelligent EMB partitioning and placement across a tiered memory hierarchy. Second, in modern DLRMs, EMBs function as hash tables. As a result, EMBs display interesting phenomena, such as the birthday paradox, leaving EMBs severely under-utilized. Rec-Shard determines an optimal EMB sharding strategy for a set of EMBs based on training data distributions and model characteristics, along with the bandwidth characteristics of the underlying tiered memory hierarchy. In doing so, RecShard achieves over 6 times higher EMB training throughput on average for capacity constrained DLRMs. The throughput increase comes from improved EMB load balance by over 12 times and from the reduced access to the slower memory by over 87 times.

show abstract

ArMOR

Lustig

Trippel

Pellauer

et al. 2015

View full text Add to dashboard Cite

Architectural heterogeneity is increasing: numerous products and studies have proven the benefits of combining cores and accelerators with varying ISAs into a single system. However, an underappreciated barrier to unlocking the full potential of heterogeneity is the need to specify and to reconcile differences in memory consistency models across layers of the hardware-software stack and among on-chip components.This paper presents ArMOR, a framework for specifying, comparing, and translating between memory consistency models. ArMOR defines MOSTs, an architecture-independent and precise format for specifying the semantics of memory ordering requirements such as preserved program order or explicit fences. MOSTs allow any two consistency models to be directly and algorithmically compared, and they help avoid many of the pitfalls of traditional consistency model analysis. As a case study, we use ArMOR to automatically generate translation modules called shims that dynamically translate code compiled for one memory model to execute on hardware implementing a different model.

show abstract

Porcupine: a synthesizing compiler for vectorized homomorphic encryption

Cowan

Dangwal

Alaghi

et al. 2021

View full text Add to dashboard Cite

Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly on encrypted data. Despite its promise, HE has seen limited use due to performance overheads and compilation challenges. Recent work has made significant advances to address the performance overheads but automatic compilation of efficient HE kernels remains relatively unexplored.This paper presents Porcupine, an optimizing compiler that generates vectorized HE code using program synthesis. HE poses three major compilation challenges: it only supports a limited set of SIMD-like operators, it uses long-vector operands, and decryption can fail if ciphertext noise growth is not managed properly. Porcupine captures the underlying HE operator behavior so that it can automatically reason about the complex trade-offs imposed by these challenges to generate optimized, verified HE kernels. To improve synthesis time, we propose a series of optimizations including a sketch design tailored to HE to narrow the program search space. We evaluate Porcupine using a set of kernels and show speedups of up to 52% (25% geometric mean) compared to heuristic-driven hand-optimized kernels. Analysis of Porcupine's synthesized code reveals that optimal solutions are not always intuitive, underscoring the utility of automated reasoning in this domain.

show abstract

Synthesizing Formal Models of Hardware from RTL for Efficient Verification of Memory Model Implementations

Hsiao

Mulligan

Nikoleris

et al. 2021

View full text Add to dashboard Cite

Full-Stack Memory Model Verification with TriCheck

et al. 2018

View full text Add to dashboard Cite

ILA-MCM: Integrating Memory Consistency Models with Instruction-Level Abstractions for Heterogeneous System-on-Chip Verification

Zhang

Trippel

Manerkar

et al. 2018

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.