Gradient Boosted Tree is a powerful machine learning method that supports both classification and regression, and is widely used in fields requiring high-precision prediction, particularly for various types of tabular data sets. Owing to the recent increase in data size, the number of attributes, and the demand for frequent model updates, a fast and efficient training is required. FPGA is suitable for acceleration with power efficiency because it can realize a domain specific hardware architecture; however it is necessary to flexibly support many hyper-parameters to adapt to various dataset sizes, dataset properties, and system limitations such as memory capacity and logic capacity. We introduce a fully pipelined hardware implementation of Gradient Boosted Tree training and a design framework that enables a versatile hardware system description with high performance and flexibility to realize highly parameterized machine learning models. Experimental results show that our FPGA implementation achieves a 11-to 33-times faster performance and more than 300-times higher power efficiency than a state-of-the-art GPU accelerated software implementation.
As the scale of digital circuit design increases, design and verification using conventional hardware description languages (HDLs), such as verilog-HDL and VHDL, limit efficiency. Consequently, high level synthesis (HLS), as well as domain specific languages (DSLs), which alternates conventional HDLs, are beginning to garner attention. We proposed a design framework that uses the C language as a register transfer level descriptive language. In this study, we introduced a LLVM compiler infrastructure to extend our previous work, support the C/C++ standard as the input code, and aggressively optimize the circuit design. In addition to supporting a single module generation, we extended our framework to support the hierarchical module description for efficient system design. We demonstrated the conversion of the input of C/C++ code into the verilog code, optimize its logic, and construct pipelined logic to achieve the original behavior in multiple clock cycles. Our framework offers a significantly efficient system-level hardware design and a powerful debugging environment with software development platforms and tool-sets. The generated hardware logic performs as well as or better than hand-written logic using conventional HDLs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.