Proceedings of the 49th Annual International Symposium on Computer Architecture 2022
DOI: 10.1145/3470496.3533044
|View full text |Cite
|
Sign up to set email alerts
|

Understanding data storage and ingestion for large-scale deep recommendation model training

Abstract: Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators (DSA) are used to train increasinglycomplex deep learning models. These clusters rely on a data storage and ingestion (DSI) pipeline, responsible for storing exabytes of training data and serving it at tens of terabytes per second. As DSAs continue to push training efficiency and throughput, the DSI pipeline is becoming the dominating factor that constrains the overall training performance and capacity. Innovations th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(7 citation statements)
references
References 48 publications
0
7
0
Order By: Relevance
“…Given a model, its architecture and algebraic computations are fixed; we also know which LA operators are affected by factorization [37]. To examine the relative speedup of factorization, we mainly need to inspect data redundancy [35], [37], and the interactions between physical data transfers (e.g., network and memory bandwidth) [73]. Existing solutions.…”
Section: B Cost Estimation Challenge: To Factorize or To Materializementioning
confidence: 99%
“…Given a model, its architecture and algebraic computations are fixed; we also know which LA operators are affected by factorization [37]. To examine the relative speedup of factorization, we mainly need to inspect data redundancy [35], [37], and the interactions between physical data transfers (e.g., network and memory bandwidth) [73]. Existing solutions.…”
Section: B Cost Estimation Challenge: To Factorize or To Materializementioning
confidence: 99%
“…The current erbium implementation uses dictionary encoding to reduce both the storage requirement and the online data movement. Therefore, queries must be encoded before being sent to the accelerators, just like data quantisation and normalisation in machine learning pipelines [33]. This process is carried out individually at the worker level in a pipeline manner, while the previous query batch is being executed by the FPGA kernel.…”
Section: Setupmentioning
confidence: 99%
“…This imbalance is pervasive and often quite large. For instance, a recent study by Meta of their ML pipelines [33] shows that GPUs used for training ML models are stalled up to 56 % of the time waiting for input data. They also show the increasing amount of compute power, network, and memory bandwidth needed on the CPU side to be able to match the throughput of the accelerator.…”
Section: Introductionmentioning
confidence: 99%
“…Thus, defined in the field of innovation performance more precisely forecasts the adoption of new products, leading to the development of innovation (domain-specific innovativeness) in specific areas, using innovative specific areas to forecast consumers' specific interest in the field of new products early adoption behavior and attitude [45]. Researchers worldwide have been adapting this model's structure to new fields and products in recent years [46][47][48][49][50][51]. However, many studies look at the product side of things, concentrating on the early degree of customers' adoption of new items while disregarding that some people only pay attention to the information about new products but do not necessarily buy them.…”
Section: Domain-specific Innovation (Dsi)mentioning
confidence: 99%