While Multiprocessor System-On-Chips (MPSoCs) are becoming widely adopted in embedded systems, communication architecture analysis for MPSoCs becomes ever more complex. There is a growing need for faster and accurate performance estimation techniques for on-chip bus architecture. This paper presents a novel fast statistical based bus stall prediction model that enables estimating the effects of bus-contention stall on the cycle-count of an application program on a subject MPSoC architecture. Our technique fills the gap in existing techniques for bus performance estimation, that are either not accurate enough (e.g. static techniques) or too slow to be used in iterative analysis (e.g. cycle by cycle arbitration simulation on every bus access). First we formulate a model named "single blocking model" that models blocking of a single bus request due to a single bus transfer on another bus master at a time. Furthermore we augment this model with a "burst blocking model" that models bus stall incurred due to burst bus transfers. Together these two models give us a very fast way to predict bus stalls on an MPSoC bus. It is assumed that each Processor in the system has a distinct fixed priority, and arbitration is based on priority. The proposed technique makes use of accumulated "workload statistics" to accurately predict the "stall cycle counts" caused due to bus contention. This eliminates the need to simulate arbitration on every bus access, resulting in substantial speed-up. Proposed technique is verified by experiments on applications such as "synthetic traffic generators", "Newton-Euler dynamic control calculation for the 6-degrees-of-freedom Stanford manipulator benchmark", "Random sparse matrix solver for electronic circuit simulations benchmark", "Fast Fourier Transform with 1024 inputs of complex numbers" and "SPEC95 Fpppp which is a chemical program performing multi-electron integral derivatives". Experimental results show that the proposed method delivers a speed-up factor of 1.33, 1.7, 74 and 6 against the simulation method for the four benchmark applications respectively, while average estimation error is 7% for benchmark application, "Fast Fourier Transform with 1024 inputs of complex numbers" and under 1% for other benchmarks.
Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.