“…Bogdan and Marculescu [29], [30] devoted to accurately quantify dynamic traffic characteristics on NoCs, then proposed a mathematical framework under nonequilibrium conditions and a statistical physics inspired approach based on nonstationary and multifractal, respectively. They presented mathematical models of workloads accurately with runtime dynamical features and exploited accumulated information for constructing prediction tools and optimization strategies [31]. Kiasari et al [17] proposed an analytical model of accurate latency for NoC.…”
Network-on-chip (NoC) is promising for the communication paradigm of the next-generation multiprocessor system-on-chip (MPSoC). As communication has become an integral part of on-chip computing, and even the performance bottleneck, researchers are paying much attention to its implementation and optimization. Traditional techniques that model communication inaccurately will lead to unexpected runtime performance, which is on average 90.8% worse than the predicted results based on observation, and are not suitable for the deep optimization of communication-intensive scenarios. In this paper, techniques are presented for the NoC-based MPSoCs that integrate optimization on interprocessor communications with the objective of minimizing the schedule length. A fine-grained integer-linear programming (ILP) model is proposed to properly address the communication latency with a network contention, which generates runtime scheduling with trivial performance difference from the predictions. We further propose a heuristic algorithm, unified priority-based scheduling (UPS), to effectively solve the contention problem in polynomial time by assigning priorities to messages. Evaluation results show that the solutions obtained by the ILP model outperform the state-of-the-art techniques by 31.1%, and UPS improves application performance by 34.7% and 44.4% compared with acquainted first-in-first-out (FIFO)-based and random-based methods. In addition, UPS achieves averagely 8.3% approximated results with the optimal solutions generated by ILP. A case study on H.264 high-definition television (HDTV) decoder and the digital signal processor (DSP) filter benchmarks achieves significant improvement on the performance and the results prediction accuracy, as well as the prominent reduction in the number of network contention and energy consumption.
“…Bogdan and Marculescu [29], [30] devoted to accurately quantify dynamic traffic characteristics on NoCs, then proposed a mathematical framework under nonequilibrium conditions and a statistical physics inspired approach based on nonstationary and multifractal, respectively. They presented mathematical models of workloads accurately with runtime dynamical features and exploited accumulated information for constructing prediction tools and optimization strategies [31]. Kiasari et al [17] proposed an analytical model of accurate latency for NoC.…”
Network-on-chip (NoC) is promising for the communication paradigm of the next-generation multiprocessor system-on-chip (MPSoC). As communication has become an integral part of on-chip computing, and even the performance bottleneck, researchers are paying much attention to its implementation and optimization. Traditional techniques that model communication inaccurately will lead to unexpected runtime performance, which is on average 90.8% worse than the predicted results based on observation, and are not suitable for the deep optimization of communication-intensive scenarios. In this paper, techniques are presented for the NoC-based MPSoCs that integrate optimization on interprocessor communications with the objective of minimizing the schedule length. A fine-grained integer-linear programming (ILP) model is proposed to properly address the communication latency with a network contention, which generates runtime scheduling with trivial performance difference from the predictions. We further propose a heuristic algorithm, unified priority-based scheduling (UPS), to effectively solve the contention problem in polynomial time by assigning priorities to messages. Evaluation results show that the solutions obtained by the ILP model outperform the state-of-the-art techniques by 31.1%, and UPS improves application performance by 34.7% and 44.4% compared with acquainted first-in-first-out (FIFO)-based and random-based methods. In addition, UPS achieves averagely 8.3% approximated results with the optimal solutions generated by ILP. A case study on H.264 high-definition television (HDTV) decoder and the digital signal processor (DSP) filter benchmarks achieves significant improvement on the performance and the results prediction accuracy, as well as the prominent reduction in the number of network contention and energy consumption.
“…An important concept for NoC-based MPSoCs is the partitioning and scheduling. To overcome the limitation of the workload models in [6,7] and capture the intricate computational interdependencies, Xiao et al [39] described a complex network inspired modeling of applications and partitioning scheme for multicore design. Along the same lines, the work in [8] proposed a task assignment algorithm to allocate a valid processor and start time for each task and get an upper bound on the number of processors required by the tasks.…”
Unmanned Aerial Vehicles (UAVs) have rapidly become popular for monitoring, delivery, and actuation in many application domains such as environmental management, disaster mitigation, homeland security, energy, transportation, and manufacturing. However, the UAV perception and navigation intelligence (PNI) designs are still in their infancy and demand fundamental performance and energy optimizations to be eligible for mass adoption. In this article, we present a generalizable three-stage optimization framework for PNI systems that (i) abstracts the high-level programs representing the perception, mining, processing, and decision making of UAVs into complex weighted networks tracking the interdependencies between universal low-level intermediate representations; (ii) exploits a differential geometry approach to schedule and map the discovered PNI tasks onto an underlying manycore architecture. To mine the complexity of optimal parallelization of perception and decision modules in UAVs, this proposed design methodology relies on an Ollivier-Ricci curvature-based load-balancing strategy that detects the parallel communities of the PNI applications for maximum parallel execution, while minimizing the inter-core communication; and (iii) relies on an energy-aware mapping scheme to minimize the energy dissipation when assigning the communities onto tile-based networks-on-chip. We validate this approach based on various drone PNI designs including flight controller, path planning, and visual navigation. The experimental results confirm that the proposed framework achieves 23% flight time reduction and up to 34% energy savings for the flight controller application. In addition, the optimization on a 16-core platform improves the on-time visit rate of the path planning algorithm by 14% while reducing 81% of run time for ConvNet visual navigation.
“…There has been a significant amount of previous research on energy-aware and load-balancing scheduling and mapping on multicore embedded systems. From a mathematical and control perspective, Bogdan et al in [4,5] provide a complex approach to dynamically characterize the workload of multicore systems for performance and power optimization. Xiao et al propose a complex network-inspired application partitioning tool to improve multicore parallelization [15].…”
In this paper, we present a loadbalancing approach to analyze and partition the UAV perception and navigation intelligence (PNI) code for parallel execution, as well as assigning each parallel computational task to a processing element in an Network-on-chip (NoC) architecture such that the total communication energy is minimized and congestion is reduced. First, we construct a data dependency graph (DDG) by converting the PNI high level program into Low Level Virtual Machine (LLVM) Intermediate Representation (IR). Second, we propose a scheduling algorithm to partition the PNI application into clusters such that (1) inter-cluster communication is minimized, (2) NoC energy is reduced and (3) the workloads of different cores are balanced for maximum parallel execution. Finally, an energy-aware mapping scheme is adopted to assign clusters onto tile-based NoCs. We validate this approach with a drone selfnavigation application and the experimental results show that we can achieve up to 8.4x energy reduction and 10.5x performance speedup.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.