MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture 2021
DOI: 10.1145/3466752.3480057
|View full text |Cite
|
Sign up to set email alerts
|

Equinox: Training (for Free) on a Custom Inference Accelerator

Abstract: DNN inference accelerators executing online services exhibit low average loads because of service demand variability, leading to poor resource utilization. Unfortunately, reclaiming idle inference cycles is difficult as other workloads can not execute on a custom accelerator. With recent proposals for the use of fixed-point arithmetic in training, there are opportunities for training services to piggyback on inference accelerators. We make the observation that a key challenge in doing so is maintaining service… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 35 publications
0
7
0
Order By: Relevance
“…In our work, we use the BFP number format, which provides a middle ground between FXP and FP number formats. BFP has been used for DNN inference [7], [13], [46] and training [17], [59] as it is less costly than the FP formats and achieves better accuracy than the FXP formats with the same bit-width. BFP format splits tensors into groups and assigns an exponent to each group that is shared by the elements within the group.…”
Section: B Data Formats For Dnnsmentioning
confidence: 99%
“…In our work, we use the BFP number format, which provides a middle ground between FXP and FP number formats. BFP has been used for DNN inference [7], [13], [46] and training [17], [59] as it is less costly than the FP formats and achieves better accuracy than the FXP formats with the same bit-width. BFP format splits tensors into groups and assigns an exponent to each group that is shared by the elements within the group.…”
Section: B Data Formats For Dnnsmentioning
confidence: 99%
“…Many DNN accelerator architectures are based on the variants of systolic arrays [4,12,17,31,32,39,40,53]. Google has adopted an ASIC solution and developed Tensor Processing Units (TPU) for its cloud services.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast, multi-pod designs with minimally sized arrays [33] target maximum utilization. Unfortunately, these designs compromise the inference accelerator's power efficiency by over-provisioning overall on-chip memory [17] (e.g., 8x8 arrays incur 5 − 10× more memory accesses than 128 × 128 arrays). Therefore, even at high utilization, such multi-pod designs achieve inferior throughput/Watt relative to designs with coarse-grain pods [50].…”
Section: Introductionmentioning
confidence: 99%
“…As discussed earlier, effective continuous learning in an autonomous system demands not just exceptional performance but also significant energy efficiency, which motivates the exploration of strategies that can achieve enhanced performance and efficiency, without sacrificing accuracy. Low-precision arithmetic through quantization has been widely adopted to significantly reduce the computational resource demands for training and inference [15], [18], [41], [59]. Among various quantization formats, block floating point (BFP) has recently gained prominence owing to its hardware-friendly characteristics and ability to support a wide range of real values [15], [18], [59].…”
Section: Opportunities From Low-precision Arithmeticsmentioning
confidence: 99%
“…Low-precision arithmetic through quantization has been widely adopted to significantly reduce the computational resource demands for training and inference [15], [18], [41], [59]. Among various quantization formats, block floating point (BFP) has recently gained prominence owing to its hardware-friendly characteristics and ability to support a wide range of real values [15], [18], [59]. BFP groups a set of floating point values, forces them to have a shared exponent by shifting the mantissa accordingly, and stores the group of truncated mantissa bits along with the shared exponent.…”
Section: Opportunities From Low-precision Arithmeticsmentioning
confidence: 99%