Developing and fine-tuning software programs for heterogeneous hardware such as CPU/GPU processing platforms comprise a highly complex endeavor that demands considerable time and effort of software engineers and requires evaluating various fundamental components and features of both the design and of the platform to maximize the overall performance. The dataflow programming approach has proven to be an appropriate methodology for reaching such a difficult and complex goal for the intrinsic portability and the possibility of easily decomposing a network of actors on different processing units of the heterogeneous hardware. Nonetheless, such a design method might not be enough on its own to achieve the desired performance goals, and supporting tools are useful to be able to efficiently explore the design space so as to optimize the desired performance objectives. This article presents a methodology composed of several stages for enhancing the performance of dataflow software developed in RVC-CAL and generating low-level implementations to be executed on GPU/CPU heterogeneous hardware platforms. The stages are composed of a method for the efficient scheduling of parallel CUDA partitions, an optimization of the performance of the data transmission tasks across computing kernels, and the exploitation of dynamic programming for introducing SIMD-capable graphics processing unit systems. The methodology is validated on both the quantitative and qualitative side by means of dataflow software application examples running on platforms according to various different mapping configurations.
This paper presents new optimization approaches aiming at reducing the impact of memory accesses on the performance of dataflow programs. The approach is based on introducing a high level management of composite data types in dynamic dataflow programming language for the memory processing of data tokens. It does not require essential changes to the model of computation (MOC) or to the dataflow program itself. The objective of the approach is to remove the unnecessary constraints of memory isolations without introducing limitations to the scalability and composability properties of the dataflow paradigm. Thus the identified optimizations allow to keep the same design and programming philosophy of dataflow, whereas aiming at improving the performance of the specific configuration implementation. The different optimizations can be integrated into the current RVC-CAL design flows and synthesis tools and can be applied to different sub-networks partitions of the dataflow program. The paper introduces the context, the definition of the optimization problem and describes how it can be applied to dataflow designs. Some examples of the optimizations are provided.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.