Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-carried dependences which prevent parallelism. In addition, performance losses result from cache conflicts in fused loops. In this paper, we present new techniques to: 1) allow fusion of loop nests in the presence of fusion-preventing dependences, 2) maintain parallelism and allow the parallel execution of fused loops with minimal synchronization, and 3) eliminate cache conflicts in fused loops. We describe algorithms for implementing these techniques in compilers. The techniques are evaluated on a 56-processor KSR2 multiprocessor and on a 16-processor Convex SPP-1000 multiprocessor. The results demonstrate performance improvements for both kernels and complete applications. The results also indicate that careful evaluation of the profitability of fusion is necessary as more processors are used.
This paper describes multiprocessor enhancements of the SimpleScalar tool set. The core simulation code has been modified to support multiprocessing, and a run-time library has been introduced for thread creation and synchronization. Measurements using the SPLASH-2 parallel benchmark suite [13] indicate that the multiprocessor enhancements introduce simulation overhead of 30%-50% relative to the original uniprocessor simulator. An idealized multiprocessor cache simulator has also been developed, and a dynamic visualization tool for cache coherence has also been developed. These multiprocessor enhancements are available at the WWW site for the SimpleScalar tool set.
Abstract-Negative bias temperature instability (NBTI) significantly affects nanoscale integrated circuit performance and reliability. The degradation in threshold voltage (V th ) due to NBTI is further affected by the initial value of V th from fabricationinduced process variation (PV). Addressing these challenges in embedded FPGA designs is possible, as FPGA reconfigurablility can be exploited to measure the exact timing degradation of an FPGA due to the joint effect of NBTI and PV at run time with low overhead. The gathered information can then be used to improve the run-time performance and reliability of FPGA designs without targeting the pessimistic worst case.In this paper, we present joint NBTI/PV-aware placement techniques for FPGAs, including NBTI/PV-aware timing analysis, region-based delay estimation, and a new move-acceptance procedure. To evaluate the proposed techniques, we combine PV measurements from 15 Xilinx Virtex-II Pro FPGAs with a model of NBTI. The proposed techniques reduce the effect of NBTI/PV by more than 60% for over 60% of the 15 FPGA chips used in the experiments, with a typical run-time overhead of 1.4-1.8X. The standalone move-acceptance procedure also produces good results with negligible run-time overhead, making it suitable for online FPGA compilation and optimization flows. I. INTRODUCTIONNegative bias temperature instability (NBTI) is a leading reliability challenge at the nano-scale level that causes degradation in the threshold voltage (V th ) of a PMOS transistor, which gradually increases delay. NBTI occurs when a PMOS transistor is stressed under high temperatures with V gs = −V dd , causing high oxide electric field (E ox ); this stress causes some Si-H bonds on the Si-SiO 2 interface to break, leaving unpaired valence electron in Si atoms. These broken bonds are called interface traps. The existence of such traps increases the absolute value of V th in PMOS transistors.NBTI degradation is further affected by the initial value of V th that may deviate from the nominal value due to process variation (PV). The initial value of V th determines the amount of E ox ; the smaller its initial value, the higher E ox , which increases NBTI degradation. However, the joint NBTI/PV effect, i.e., combining initial V th with the expected degradation due to NBTI, shows that variation in V th is always the dominating factor, which means that transistors with smaller initial V th will have smaller V th even after NBTI degradation.Variations in V th are expected to further increase according to the ITRS [1]. Addressing the joint NBTI/PV effect becomes increasingly critical.The regular structure of FPGAs and their reconfigurability can be exploited to measure the joint effect of PV and NBTI at run time. The results can be used to optimize circuit placement and routing with awareness of NBTI and PV effects to improve
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.