Cyber-physical systems (CPS) aim to monitor and control complex real-world phenomena where the computational cost and real-time constraints could be a major challenge. Manycore hardware accelerators such as graphics processing units (GPUs) promise to enhancing computation, leveraging the data parallelism often found in real-world scenarios of CPS, but performance is limited by the overhead of the data transfer between the host and the device memory. For example, plasma control in the HBT-EP Tokamak device at Columbia University [11,18] must execute the control algorithm in a few microseconds, but may take tens of microseconds to copy the data set between the host and the device memory. This paper presents a zero-copy I/O processing scheme that maps the I/O address space of the system to the virtual address space of the compute device, allowing sensors and actuators to transfer data to and from the compute device directly. Experiments using the plasma control system show a 33% reduction in computational cost, and microbenchmarks with more generic matrix operations show a 34% reduction, while in both cases, effective data throughput remains at least as good as the current best performers.
Abstract-Graphics processing units (GPUs) are increasingly being used for general purpose parallel computing. They provide significant performance gains over multi-core CPU systems, and are an easily accessible alternative to supercomputers. The architecture of general purpose GPU systems (GPGPU), however, poses challenges in efficiently transferring data among the host and device(s). Although commodity manycore devices such as NVIDIA GPUs provide more than one way to move data around, it is unclear which method is most effective given a particular application. This presents difficulty in supporting latency-sensitive cyber-physical systems (CPS).In this work we present a new approach to data transfer in a heterogeneous computing system that allows direct communication between GPUs and other I/O devices. In addition to adding this functionality our system also improves communication between the GPU and host. We analyze the current vendor provided data communication mechanisms and identify which methods work best for particular tasks with respect to throughput, and total time to completion.Our method allows a new class of real-time cyber-physical applications to be implemented on a GPGPU system. The results of the experiments presented here show that GPU tasks can be completed in 34 percent less time than current methods. Furthermore, effective data throughput is at least as good as the current best performers. This work is part of concurrent development of Gdev [6], an open-source project to provide Linux operating system support of many-core device resource management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.