With the emergence of accelerator devices such as multicores, graphics-processing units (GPUs), and field-programmable gate arrays (FPGAs), application designers are confronted with the problem of searching a huge design space that has been shown to have widely varying performance and energy metrics for different accelerators, different application domains, and different use cases. To address this problem, numerous studies have evaluated specific applications across different accelerators. In this paper, we analyze an important domain of applications, referred to as sliding-window applications, when executing on FPGAs, GPUs, and multicores. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that FPGAs can achieve speedup of up to 11x and 57x compared to GPUs and multicores, respectively, while also using orders of magnitude less energy.
Recent architectural trends have focused on increased parallelism via multicore processors and increased heterogeneity via accelerator devices (e.g., graphics-processing units, field-programmable gate arrays). Although these architectures have significant performance and energy potential, application designers face many device-specific challenges when choosing an appropriate accelerator or when customizing an algorithm for an accelerator. To help address this problem, in this article we thoroughly evaluate convolution, one of the most common operations in digital-signal processing, on multicores, graphics-processing units, and field-programmable gate arrays. Whereas many previous application studies evaluate a specific usage of an application, this article assists designers with design space exploration for numerous use cases by analyzing effects of different input sizes, different algorithms, and different devices, while also determining Pareto-optimal trade-offs between performance and energy.
The increasing usage of hardware accelerators such as Field-Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) has significantly increased application design complexity. Such complexity results from a larger design space created by numerous combinations of accelerators, algorithms, and hw/sw partitions. Exploration of this increased design space is critical due to widely varying performance and energy consumption for each accelerator when used for different application domains and different use cases. To address this problem, numerous studies have evaluated specific applications across different architectures.In this article, we analyze an important domain of applications, referred to as sliding-window applications, implemented on FPGAs, GPUs, and multicore CPUs. For each device, we present optimization strategies and analyze use cases where each device is most effective. The results show that, for large input sizes, FPGAs can achieve speedups of up to 5.6× and 58× compared to GPUs and multicore CPUs, respectively, while also using up to an order of magnitude less energy. For small input sizes and applications with frequency-domain algorithms, GPUs generally provide the best performance and energy. . 2015. A tradeoff analysis of FPGAs, GPUs, and multicores for sliding-window applications.
To accommodate the increase in energy and luminosity of the upgraded LHC, the CMS Endcap Muon Level 1 Trigger system has to be significantly modified. To provide the best track reconstruction, the Trigger system must now import all available trigger primitives generated by Cathode Strip Chambers and by other regional subsystems, such as Resistive Plate Chambers. In addition to massive input bandwidth, this also requires a significant increase in logic and memory resources.To satisfy these requirements, a new Sector Processor unit for muon track finding is being designed. This unit follows the micro-TCA standard recently adopted by CMS. It consists of three modules. The Core Logic module houses the large FPGA that contains the processing logic and multi-gigabit serial links for data exchange. The Optical module contains optical receivers and transmitters; it communicates with the Core Logic module via a custom backplane section. The Look-Up Table module contains a large amount of low-latency memory that is used to assign the final transverse momentum of the muon candidate tracks. The name of the unit -Modular Track Finder -reflects the modular approach used in the design.Presented here are the details of the hardware design of the prototype unit based on Xilinx's Virtex-6 FPGA family, MTF6, as well as results of the conducted tests. Also presented are plans for the pre-production prototype based on the Virtex-7 FPGA family, MTF7. KEYWORDS: Trigger concepts and systems (hardware and software); Large detector systems for particle and astroparticle physics; Particle tracking detectors (Gaseous detectors)
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.