The emergence of multicore architectures will lead to an increase in the use of multithreaded applications that are prone to synchronization bugs, such as data races. Software solutions for detecting data races generally incur large overheads. Hardware support for race detection can significantly reduce that overhead. However, all existing hardware proposals for race detection are based on the happensbefore algorithm which is sensitive to thread interleaving and cannot detect races that are not exposed during the monitored run. The lockset algorithm addresses this limitation. Unfortunately, due to the challenging issues such as storing the lockset information and performing complex set operations, so far it has been implemented only in software with 10-30 times performance hit. This paper proposes the first hardware implementation (called HARD) of the lockset algorithm to exploit the race detection capability of this algorithm with minimal overhead. HARD efficiently stores lock sets in hardware bloom filters and converts the expensive set operations into fast bitwise logic operations with negligible overhead. We evaluate HARD using six SPLASH-2 applications with 60 randomly injected bugs. Our results show that HARD can detect 54 out of 60 tested bugs, 20% more than happens-before, with only 0.1-2.6% of execution overhead. We also show our hardware design is cost-effective by comparing with the ideal lockset implementation, which would require a large amount of hardware resources.
Memory can be efficiently utilized if the dynamic memory demands of applications can be determined and analyzed at run-time. The page miss ratio curve(MRC), i.e. page miss rate vs. memory size curve, is a good performancedirected metric to serve this purpose. However, dynamically tracking MRC at run time is challenging in systems with virtual memory because not every memory reference passes through the operating system (OS).This paper proposes two methods to dynamically track MRC of applications at run time. The first method is using a hardware MRC monitor that can track MRC at fine time granularity. Our simulation results show that this monitor has negligible performance and energy overheads. The second method is an OS-only implementation that can track MRC at coarse time granularity. Our implementation results on Linux show that it adds only 7-10% overhead.We have also used the dynamic MRC to guide both memory allocation for multiprogramming systems and memory energy management. Our real system experiments on Linux with applications including Apache Web Server show that the MRC-directed memory allocation can speed up the applications' execution/response time by up to a factor of 5.86 and reduce the number of page faults by up to 63.1%. Our execution-driven simulation results with SPEC2000 benchmarks show that the MRC-directed memory energy management can improve the Energy * Delay metric by 27-58% over previously proposed static and dynamic schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.