Within-die process variation causes individual cores in a ChipMultiprocessor (CMP) to differ substantially in both static powerconsumed and maximum frequency supported. In this environment,ignoring variation effects whenscheduling applications or when managing power withDynamic Voltage and Frequency Scaling (DVFS) is suboptimal. This paper proposes variation-aware algorithms for applicationscheduling and power management. One such power managementalgorithm, called {\em LinOpt}, uses linear programmingto find the best voltage and frequency levels for each of thecores in the CMP --- maximizing throughput at a given power budget.In a 20-core CMP, the combination of variation-awareapplication scheduling and {\em LinOpt} increases the averagethroughput by 12--17\% and reduces the average $ED^2$ by 30--38\%--- all relative to using variation-awarescheduling together with a simple extension to Intel's Foxtonpower management algorithm.
The emergence of multicore architectures will lead to an increase in the use of multithreaded applications that are prone to synchronization bugs, such as data races. Software solutions for detecting data races generally incur large overheads. Hardware support for race detection can significantly reduce that overhead. However, all existing hardware proposals for race detection are based on the happensbefore algorithm which is sensitive to thread interleaving and cannot detect races that are not exposed during the monitored run. The lockset algorithm addresses this limitation. Unfortunately, due to the challenging issues such as storing the lockset information and performing complex set operations, so far it has been implemented only in software with 10-30 times performance hit. This paper proposes the first hardware implementation (called HARD) of the lockset algorithm to exploit the race detection capability of this algorithm with minimal overhead. HARD efficiently stores lock sets in hardware bloom filters and converts the expensive set operations into fast bitwise logic operations with negligible overhead. We evaluate HARD using six SPLASH-2 applications with 60 randomly injected bugs. Our results show that HARD can detect 54 out of 60 tested bugs, 20% more than happens-before, with only 0.1-2.6% of execution overhead. We also show our hardware design is cost-effective by comparing with the ideal lockset implementation, which would require a large amount of hardware resources.
Lowering supply voltage is one of the most effective techniques for reducing microprocessor power consumption. Unfortunately, at low voltages, chips are very sensitive to process variation, which can lead to large differences in the maximum frequency achieved by individual cores. This paper presents Booster, a simple, low-overhead framework for dynamically rebalancing performance heterogeneity caused by process variation and application imbalance. The Booster CMP includes two power supply rails set at two very low but different voltages. Each core can be dynamically assigned to either of the two rails using a gating circuit. This allows cores to quickly switch between two different frequencies. An on-chip governor controls the timing of the switching and the time spent on each rail. The governor manages a "boost budget" that dictates how many cores can be sped up (depending on the power constraints) at any given time. We present two implementations of Booster: Booster VAR, which virtually eliminates the effects of core-to-core frequency variation in near-threshold CMPs, and Booster SYNC, which additionally reduces the effects of imbalance in multithreaded applications. Evaluation using PARSEC and SPLASH2 benchmarks running on a simulated 32-core system shows an average performance improvement of 11% for Booster VAR and 23% for Booster SYNC.
Reducing traffic accidents is an important public safety challenge, therefore, accident analysis and prediction has been a topic of much research over the past few decades. Using small-scale datasets with limited coverage, being dependent on extensive set of data, and being not applicable for real-time purposes are the important shortcomings of the existing studies. To address these challenges, we propose a new solution for real-time traffic accident prediction using easy-to-obtain, but sparse data. Our solution relies on a deepneural-network model (which we have named DAP, for Deep Accident Prediction); which utilizes a variety of data attributes such as traffic events, weather data, points-of-interest, and time. DAP incorporates multiple components including a recurrent (for time-sensitive data), a fully connected (for time-insensitive data), and a trainable embedding component (to capture spatial heterogeneity). To fill the data gap, we have -through a comprehensive process of data collection, integration, and augmentation -created a large-scale publicly available database of accident information named US-Accidents. By employing the US-Accidents dataset and through an extensive set of experiments across several large cities, we have evaluated our proposal against several baselines. Our analysis and results show significant improvements to predict rare accident events. Further, we have shown the impact of traffic information, time, and pointsof-interest data for real-time accident prediction.
Lowering supply voltage is one of the most effective approaches for improving the energy efficiency of microprocessors. Unfortunately, technology limitations, such as process variability and circuit aging, are forcing microprocessor designers to add larger voltage guardbands to their chips. This makes supply voltage increasingly difficult to scale with technology. This paper presents a new mechanism for dynamically reducing voltage margins while maintaining the chip operating frequency constant. Unlike previous approaches that rely on special hardware to detect and recover from timing violations caused by low-voltage execution, our solution is firmware-based and does not require additional hardware. Instead, it relies on error correction mechanisms already built into modern processors. The system dynamically reduces voltage margins and uses correctable error reports raised by the hardware to identify the lowest, safe operating voltage. The solution adapts to core-to-core variability by tailoring supply voltage to each core's safe operating level. In addition, it exploits variability in workload vulnerability to low voltage execution. The system was prototyped on an HP Integrity Server that uses Intel's Itanium 9560 processors. Evaluation using SPECjbb2005 and SPEC CPU2000 workloads shows core power savings ranging from 18% to 23%, with minimal performance impact.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.