Machine Learning at Facebook: Understanding Inference at the Edge

Wu, Carole-Jean; Brooks, David; Chen, Kevin; Chen, Douglas; Choudhury, Sy; Dukhan, Marat; Hazelwood, Kim; Isaac, E Lagaris; Jia, Yangqing; Jia, Bill; Leyvand, Tommer; Lu, Hao; Lu, Yang; Qiao, Lin; Reagen, Brandon; Spisak, Joe; Sun, Fei; Tulloch, Andrew; Vajda, Péter; Xiao-dong, Wang; Wang, Yanghan; Wasti, Bram; Wu, Yiming; Xian, Ran; Yoo, Sungjoo; Zhang, Peizhao

doi:10.1109/hpca.2019.00048

Cited by 327 publications

(211 citation statements)

References 36 publications

Supporting

Mentioning

208

Contrasting

Unclassified

Order By: Relevance

“…The Odroid Xu3 implements the Exynos 5410 SoC that was released in 2014, and thus represents a low to medium end mobile spec. It is to note a recent study published in 2019 [61] suggests that 75% of today's smartphones still use a CPU design that was released before 2013. Therefore, including Odroid Xu3 in our evaluation ensures that our approach is evaluated on a platform that presents a wide range of mobile devices.…”

Section: A Hardware and Software Platformsmentioning

confidence: 99%

Using Machine Learning to Optimize Web Interactions on Heterogeneous Mobile Systems

et al. 2019

View full text Add to dashboard Cite

The web has become a ubiquitous application development platform for mobile systems. Yet, web access on mobile devices remains an energy-hungry activity. Prior work in the field mainly focuses on the initial page loading stage, but fails to exploit the opportunities for energy-efficiency optimization while the user is interacting with a loaded page. This paper presents a novel approach for performing energy optimization for interactive mobile web browsing. At the heart of our approach is a set of machine learning models, which estimate at runtime the frames per second for a given user interaction input by running the computation-intensive web render engine on a specific processor core under a given clock speed. We use the learned predictive models as a utility function to quickly search for the optimal processor setting to carefully trade responsive time for reduced energy consumption. We integrate our techniques to the opensource Chromium browser and apply it to two representative mobile user events: scrolling and pinching (i.e., zoom in and out). We evaluate the developed system on the landing pages of the top-100 hottest websites and two big.LITTLE heterogeneous mobile platforms. Our extensive experiments show that the proposed approach reduces the system-wide energy consumption by over 36% on average and up to 70%. This translates to an over 17% improvement on energy-efficiency over a state-of-the-art event-based web browser scheduler, but with significantly fewer violations on the quality of service.

show abstract

Section: A Hardware and Software Platformsmentioning

confidence: 99%

Using Machine Learning to Optimize Web Interactions on Heterogeneous Mobile Systems

et al. 2019

View full text Add to dashboard Cite

show abstract

Section: Introductionmentioning

confidence: 99%

“…Memory access cost increases by ∼10× from 8 kB to 1 MB memory with a 64-bit cache [2]. In general, there is a gap between memory storage, bandwidth, compute requirements, and energy consumption of modern DNNs and hardware resources available on edge devices [3].An apparent solution to address this gap is to compress such networks, thus reducing the compute requirements to match putative edge resources. Several groups have proposed compressed new compute-and memory-efficient DNN architectures [4]-[6] and parameter-efficient neural networks, using methods such as DNN pruning [7], distillation [8], and low-precision arithmetic [9], [10].…”

mentioning

confidence: 99%

Understanding the impact of precision quantization on the accuracy and energy of neural networks

Hashemi

Anthony

Tann

et al. 2017

Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 2017

View full text Add to dashboard Cite

Recently, the posit numerical format has shown promise for DNN data representation and compute with ultralow precision ([5..8]-bit). However, majority of studies focus only on DNN inference. In this work, we propose DNN training using posits and compare with the floating point training. We evaluate on both MNIST and Fashion MNIST corpuses, where 16-bit posits outperform 16-bit floating point for end-to-end DNN training.Index Terms-Deep neural networks, low-precision arithmetic, posit numerical format I. INTRODUCTIONThe edge computing, offers a decentralized solution to cloud-based datacenters [1] and intelligence-at-the-edge of mobile networks. However, training on the edge is a challenge for many deep neural networks (DNNs). This arises due to the significant cost of multiply-and-accumulate (MAC) units, an ubiquitous operation in all DNNs. In a 45 nm CMOS process, energy consumption doubles from 16-bit floats to 32-bit floats for addition and by ∼4× for multiplication [2]. Memory access cost increases by ∼10× from 8 kB to 1 MB memory with a 64-bit cache [2]. In general, there is a gap between memory storage, bandwidth, compute requirements, and energy consumption of modern DNNs and hardware resources available on edge devices [3].An apparent solution to address this gap is to compress such networks, thus reducing the compute requirements to match putative edge resources. Several groups have proposed compressed new compute-and memory-efficient DNN architectures [4]-[6] and parameter-efficient neural networks, using methods such as DNN pruning [7], distillation [8], and low-precision arithmetic [9], [10]. Among these approaches, low-precision arithmetic is noted for its ability to reduce memory capacity, bandwidth, latency, and energy consumption associated with MAC units in DNNs and an increase in the level of data parallelism [9], [11], [12].The ultimate goal of low-precision DNN design is to reduce the original hardware complexity of the high-precision DNN model to a level suitable for edge devices without significantly degrading performance.To address the gaps in previous studies, we are motivated to study low-precision posit for DNN training on the edge. II. POSIT NUMERICAL FORMATAn alternative to IEEE-754 floating point numbers, posits were recently introduced and exhibit a tapered-precision char-

show abstract

“…Multiple-Language Smell Y [9] s02l Undeclared Consumers Y [10] s03a Decouple Training Pipeline from Production Pipeline [10] s03b ML Versioning [12] s05 Isolate and Validate Output of Model [17] s10a Distinguish Business Logic from ML Models [17] s10b Gateway Routing Architecture [40] a04 Separation of Concerns and Modularization of ML Components [19] g02a Federated Learning [19] g02b Secure Aggregation [22] g05 Handshake or Hand Buzzer [24] g07a Test Infrastructure Independently from ML [24] g07b Reuse Code between Training Pipeline and Serving Pipeline [25] g08 Data-Algorithm-Serving-Evaluator [26] g09 [45] g18 Lambda Architecture d) Motivation: ML application systems are complex systems because their ML components must be (re)trained regularly and have a non-deterministic behaviour by nature. Business requirements for these systems, as any other systems, will also change as well as ML algorithms.…”

Section: Sourcementioning

confidence: 99%

Studying Software Engineering Patterns for Designing Machine Learning Systems

Washizaki

Uchida

Khomh

et al. 2019

2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP)

View full text Add to dashboard Cite

Machine-learning (ML) techniques have become popular in the recent years. ML techniques rely on mathematics and on software engineering. Researchers and practitioners studying best practices for designing ML application systems and software to address the software complexity and quality of ML techniques. Such design practices are often formalized as architecture patterns and design patterns by encapsulating reusable solutions to commonly occurring problems within given contexts. However, to the best of our knowledge, there has been no work collecting, classifying, and discussing these softwareengineering (SE) design patterns for ML techniques systematically. Thus, we set out to collect good/bad SE design patterns for ML techniques to provide developers with a comprehensive and ordered classification of such patterns. We report here preliminary results of a systematic-literature review (SLR) of good/bad design patterns for ML.

show abstract

Machine Learning at Facebook: Understanding Inference at the Edge

Cited by 327 publications

References 36 publications

Using Machine Learning to Optimize Web Interactions on Heterogeneous Mobile Systems

Using Machine Learning to Optimize Web Interactions on Heterogeneous Mobile Systems

Understanding the impact of precision quantization on the accuracy and energy of neural networks

Studying Software Engineering Patterns for Designing Machine Learning Systems

Contact Info

Product

Resources

About