2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) 2019
DOI: 10.1109/hpca.2019.00048
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning at Facebook: Understanding Inference at the Edge

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
208
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 327 publications
(211 citation statements)
references
References 36 publications
2
208
0
1
Order By: Relevance
“…The Odroid Xu3 implements the Exynos 5410 SoC that was released in 2014, and thus represents a low to medium end mobile spec. It is to note a recent study published in 2019 [61] suggests that 75% of today's smartphones still use a CPU design that was released before 2013. Therefore, including Odroid Xu3 in our evaluation ensures that our approach is evaluated on a platform that presents a wide range of mobile devices.…”
Section: A Hardware and Software Platformsmentioning
confidence: 99%
“…The Odroid Xu3 implements the Exynos 5410 SoC that was released in 2014, and thus represents a low to medium end mobile spec. It is to note a recent study published in 2019 [61] suggests that 75% of today's smartphones still use a CPU design that was released before 2013. Therefore, including Odroid Xu3 in our evaluation ensures that our approach is evaluated on a platform that presents a wide range of mobile devices.…”
Section: A Hardware and Software Platformsmentioning
confidence: 99%
“…Memory access cost increases by ∼10× from 8 kB to 1 MB memory with a 64-bit cache [2]. In general, there is a gap between memory storage, bandwidth, compute requirements, and energy consumption of modern DNNs and hardware resources available on edge devices [3].…”
Section: Introductionmentioning
confidence: 99%
“…Memory access cost increases by ∼10× from 8 kB to 1 MB memory with a 64-bit cache [2]. In general, there is a gap between memory storage, bandwidth, compute requirements, and energy consumption of modern DNNs and hardware resources available on edge devices [3].An apparent solution to address this gap is to compress such networks, thus reducing the compute requirements to match putative edge resources. Several groups have proposed compressed new compute-and memory-efficient DNN architectures [4]-[6] and parameter-efficient neural networks, using methods such as DNN pruning [7], distillation [8], and low-precision arithmetic [9], [10].…”
mentioning
confidence: 99%
“…Multiple-Language Smell Y [9] s02l Undeclared Consumers Y [10] s03a Decouple Training Pipeline from Production Pipeline [10] s03b ML Versioning [12] s05 Isolate and Validate Output of Model [17] s10a Distinguish Business Logic from ML Models [17] s10b Gateway Routing Architecture [40] a04 Separation of Concerns and Modularization of ML Components [19] g02a Federated Learning [19] g02b Secure Aggregation [22] g05 Handshake or Hand Buzzer [24] g07a Test Infrastructure Independently from ML [24] g07b Reuse Code between Training Pipeline and Serving Pipeline [25] g08 Data-Algorithm-Serving-Evaluator [26] g09 [45] g18 Lambda Architecture d) Motivation: ML application systems are complex systems because their ML components must be (re)trained regularly and have a non-deterministic behaviour by nature. Business requirements for these systems, as any other systems, will also change as well as ML algorithms.…”
Section: Sourcementioning
confidence: 99%