Equinox: Training (for Free) on a Custom Inference Accelerator

Drumond, Mario; Coulon, Louis; Pourhabibi, Arash; Yüzügüler, Ahmet Caner; Falsafi, Babak; Jäggi, Martin

doi:10.1145/3466752.3480057

Cited by 11 publications

(7 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In our work, we use the BFP number format, which provides a middle ground between FXP and FP number formats. BFP has been used for DNN inference [7], [13], [46] and training [17], [59] as it is less costly than the FP formats and achieves better accuracy than the FXP formats with the same bit-width. BFP format splits tensors into groups and assigns an exponent to each group that is shared by the elements within the group.…”

Section: B Data Formats For Dnnsmentioning

confidence: 99%

An Electro-Photonic System for Accelerating Deep Neural Networks

Demirkiran

Eris

Wang

et al. 2023

J. Emerg. Technol. Comput. Syst.

View full text Add to dashboard Cite

The number of parameters in deep neural networks (DNNs) is scaling at about 5 × the rate of Moore’s Law. To sustain this growth, photonic computing is a promising avenue, as it enables higher throughput in dominant general matrix-matrix multiplication (GEMM) operations in DNNs than their electrical counterpart. However, purely photonic systems face several challenges including lack of photonic memory and accumulation of noise. In this paper, we present an electro-photonic accelerator, ADEPT, which leverages a photonic computing unit for performing GEMM operations, a vectorized digital electronic ASIC for performing non-GEMM operations, and SRAM arrays for storing DNN parameters and activations. In contrast to prior works in photonic DNN accelerators, we adopt a system-level perspective and show that the gains while large are tempered relative to prior expectations. Our goal is to encourage architects to explore photonic technology in a more pragmatic way considering the system as a whole to understand its general applicability in accelerating today’s DNNs. Our evaluation shows that ADEPT can provide, on average, 5.73 × higher throughput per Watt compared to the traditional systolic arrays (SAs) in a full-system, and at least 6.8 × and 2.5 × better throughput per Watt, compared to state-of-the-art electronic and photonic accelerators, respectively.

show abstract

Section: B Data Formats For Dnnsmentioning

confidence: 99%

An Electro-Photonic System for Accelerating Deep Neural Networks

Demirkiran

Eris

Wang

et al. 2023

J. Emerg. Technol. Comput. Syst.

View full text Add to dashboard Cite

show abstract

“…Many DNN accelerator architectures are based on the variants of systolic arrays [4,12,17,31,32,39,40,53]. Google has adopted an ASIC solution and developed Tensor Processing Units (TPU) for its cloud services.…”

Section: Related Workmentioning

confidence: 99%

“…In contrast, multi-pod designs with minimally sized arrays [33] target maximum utilization. Unfortunately, these designs compromise the inference accelerator's power efficiency by over-provisioning overall on-chip memory [17] (e.g., 8x8 arrays incur 5 − 10× more memory accesses than 128 × 128 arrays). Therefore, even at high utilization, such multi-pod designs achieve inferior throughput/Watt relative to designs with coarse-grain pods [50].…”

Section: Introductionmentioning

confidence: 99%

Scale-out Systolic Arrays

Yüzügüler¹,

Sönmez²,

Drumond³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod systolic arrays to maximize effective throughput/Watt-i.e., throughput/Watt adjusted when accounting for array utilizationposes a unique set of challenges. In this work, we study three key pillars in multi-pod systolic array designs, namely array granularity, interconnect, and tiling. We identify optimal array granularity across workloads and show that state-of-the-art commercial accelerators use suboptimal array sizes for single-tenancy workloads. We, then evaluate the bandwidth/latency trade-offs in interconnects and show that Butterfly networks offer a scalable topology for accelerators with a large number of pods. Finally, we introduce a novel data tiling scheme with custom partition size to maximize utilization in optimally sized pods. We propose Scale-out Systolic Arrays, a multi-pod inference accelerator for both single-and multi-tenancy based on these three pillars. We show that SOSA exhibits scaling of up to 600 TeraOps/s in effective throughput for state-of-the-art DNN inference workloads, and outperforms state-of-the-art multi-pod accelerators by a factor of 1.5×.

show abstract

“…As discussed earlier, effective continuous learning in an autonomous system demands not just exceptional performance but also significant energy efficiency, which motivates the exploration of strategies that can achieve enhanced performance and efficiency, without sacrificing accuracy. Low-precision arithmetic through quantization has been widely adopted to significantly reduce the computational resource demands for training and inference [15], [18], [41], [59]. Among various quantization formats, block floating point (BFP) has recently gained prominence owing to its hardware-friendly characteristics and ability to support a wide range of real values [15], [18], [59].…”

Section: Opportunities From Low-precision Arithmeticsmentioning

confidence: 99%

“…Low-precision arithmetic through quantization has been widely adopted to significantly reduce the computational resource demands for training and inference [15], [18], [41], [59]. Among various quantization formats, block floating point (BFP) has recently gained prominence owing to its hardware-friendly characteristics and ability to support a wide range of real values [15], [18], [59]. BFP groups a set of floating point values, forces them to have a shared exponent by shifting the mantissa accordingly, and stores the group of truncated mantissa bits along with the shared exponent.…”

Section: Opportunities From Low-precision Arithmeticsmentioning

confidence: 99%

A deep learning model to predict recurrence of atrial fibrillation after pulmonary vein isolation

Kim

Oh³

et al. 2020

Int J Arrhythm

View full text Add to dashboard Cite

Background and Objectives The efficacy of radiofrequency catheter ablation (RFCA) in atrial fibrillation (AF) is well established. The standard approach to RFCA in AF is pulmonary vein isolation (PVI). However, a large proportion of patients experiences recurrence of atrial tachyarrhythmia. The purpose of this study is to find out whether the AI model can assess AF recurrence in patients who underwent PVI. Materials and methods This study was a retrospective cohort study that enrolled consecutive patients who underwent catheter ablation for symptomatic, drug-refractory AF and PVI. We developed an AI algorithm to predict recurrence of AF after PVI using patient demographics and three-dimensional (3D) reconstructed left atrium (LA) images. Results We included 527 consecutive patients in the study. The overall mean LA diameter was 42.0 ± 6.8 mm, and the mean LA volume calculated using 3D reconstructed images was 151.1 ± 46.7 ml. During the follow-up period, atrial tachyarrhythmia recurred in 158 patients. The area under the curve (AUC) of the AI model based on a convolutional neural network (including 3D reconstruction images) was 0.61 (95% confidence interval [CI] 0.53–0.74) using the test dataset. The total test accuracy was 66.3% (57.0–75.6), and the sensitivity was 53.3% (34.8–71.9). The specificity was 73.2% (51.8–75.0), and the F1 score was 52.5% 34.5–66.7). Conclusion In this study, we developed an AI algorithm to predict recurrence of AF after catheter ablation of PVI using individual reconstructed LA images. This AI model was unable to predict recurrence of AF overwhelmingly; therefore, further large-scale study is needed.

show abstract

Equinox: Training (for Free) on a Custom Inference Accelerator

Cited by 11 publications

References 35 publications

An Electro-Photonic System for Accelerating Deep Neural Networks

An Electro-Photonic System for Accelerating Deep Neural Networks

Scale-out Systolic Arrays

A deep learning model to predict recurrence of atrial fibrillation after pulmonary vein isolation

Contact Info

Product

Resources

About