This paper presents feasibility studies in utilizing graphics processing units (GPUs) as high-performance computing hardware with front-end electronics in high-scale magnetic confinement thermal fusion experiments. The objective of the research is to provide scalable, high-throughput, and low-latency measurements for the runtime tokamak metallic impurities X-ray diagnostic for the Tungsten Environment in Steady-State Tokamak (WEST) reactor. The heterogeneous system of front-end with field-programmable gate arrays and the back-end server was introduced to decompose workloads efficiently. It allows the comprehensive evaluation of CPUs and accelerators. In particular, a novel implementation of the back-end algorithm for GPU with the performance analysis are presented.
INTRODUCTIONThe development of the International Thermonuclear Experimental Reactor is currently the second largest research project in the world by expenditure. It is associated with complex interdisciplinary studies, including computer engineering. 1,2 To sustain the reaction and to achieve the positive energy balance, numerous diagnostic and control tools need to be provided. 3 Among them, there is a measurement of soft X-rays in the spectral range corresponding to the metallic impurities emission. Controlling the impurities is extremely important to increase a reaction efficiency and to provide a safe mode of operation. 3,4 Without such control, metallic particles can accumulate and cause disruptions damaging the reactor. 1,4 A heterogeneous field-programmable gate array (FPGA) and server-based architecture was proposed for impurities diagnostic. 4-6 Positive results were achieved with Intel Xeon CPUs and Intel Xeon Phi coprocessors in handling intensive computations. 5 Nevertheless, a question has arisen if the solution can be further optimized with fine-grained parallel devices. Therefore, this work supplements the research with a new algorithm for a graphics processing unit (GPU) and a subsequent performance study. The objective is to maximize throughput and to reduce the latency in handling large diagnostic workloads.This paper is presented as follows. In Section 2, there is a description of related works and issues. Section 3 covers the processing outline and the system overview. In Section 4, there is a discussion on the utilization of GPUs in the analysis, and the parallel algorithm is proposed.Section 5 presents the performance results. Conclusions and potential future works are covered in Section 6. Concurrency Computat Pract Exper. 2020;32:e6028. wileyonlinelibrary.com/journal/cpe FPGAs have been successfully utilized in various high-scale, high-throughput, and hard real-time physics experiments. Nevertheless, limitations were observed in particular applications. Regarding plasma physics, some computations are too complex to be implemented with FPGA resources, which are also required for fast transmission and utility. Moreover, the algorithm modification is difficult and time-consuming for the front-end with custom boards and for firmware...