Restricted Boltzmann Machines (RBMs) -the building block for newly popular Deep Belief Networks (DBNs)are a promising new tool for machine learning practitioners. However, future research in applications of DBNs is hampered by the considerable computation that training requires. In this paper, we describe a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, with the goal of producing a system that machine learning researchers can use to investigate ever-larger networks.Our design uses a highly efficient, fully-pipelined architecture based on 16-bit arithmetic for performing RBM training on an FPGA. We show that only 16-bit arithmetic precision is necessary, and we consequently use embedded hardware multiply-and-add (MADD) units. We present performance results to show that a speedup of 25-30X can be achieved over an optimized software implementation on a high-end CPU.
Approximate computing is a very promising design paradigm for crossing the CPU power wall, primarily driven by the potential to sacrifice output quality for significant gains in performance, energy, and fault tolerance. Unfortunately, existing solutions have primarily either focused on new programming models, or new hardware designs, leaving significant room between these two ends for software-based optimizations. To fill this void, additional efforts should target the compilation and runtime stages, which have a critical impact on controlling the interactions of the many approximate subcomputations to form a well-optimized application.This paper presents EMEURO, a neural-network (NN) based emulation and acceleration platform. By restructuring algorithms to have the same data flow as a NN, EMEURO is able to achieve significant speedup across several domains with minimal design effort. EMEURO uses novel NN-based approximate computing techniques, including methods for efficiently searching the high-dimension subroutine space, and fine-grain control of error during runtime. EMEURO is able to achieve 7x-109x maximum speedup over the original algorithm with 0.1%-10% approximation error. 2015 IEEE/ACM International Symposium on Code Generation and Optimization 978-1-4799-8161-8/15/$31.00 c 2015 IEEE 125 1-4799-8161-8/15/$31.00 ©2015 IEEE
Training large transformer models is one of the most important computational challenges of modern AI. In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation. Activation recomputation is commonly used to work around memory capacity constraints. Rather than storing activations for backpropagation, they are traditionally recomputed, which saves memory but adds redundant compute. In this work, we show most of this redundant compute is unnecessary because we can reduce memory consumption sufficiently without it. We present two novel yet very simple techniques: sequence parallelism and selective activation recomputation. In conjunction with tensor parallelism, these techniques almost eliminate the need to recompute activations. We evaluate our approach on language models up to one trillion parameters in scale and show that our method reduces activation memory by 5×, while reducing execution time overhead from activation recomputation by over 90%. For example, when training a 530B parameter GPT-3 style model [20] on 2240 NVIDIA A100 GPUs, we achieve a Model Flops Utilization of 54.2%, which is 29% faster than the 42.1% we achieve using recomputation. Our implementation will be available in both Megatron-LM 1 and NeMo-Megatron 2 .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.