Main memory is a major shared resource among cores in a multicore system. If Technical Report No. 2011-002 (June 3, 2011 system configurations show that this integrated memory partitioning and scheduling approach provides better system performance than MCP and four previous memory scheduling algorithms employed alone. Averaged over 240 workloads on a 24-core system with 4 memory channels, IMPS improves system throughput by 11.1% over an application unaware scheduler and 5% over the current best scheduling policy, while incurring much lower hardware complexity than the latter.
Applications running concurrently on a multicore system in terfere with each other at the main memory. This interference can slow down diff erent applications diff erently. Accurately es timating the slowdown of each application in such a system can enable mechanisms that can enforce quality-of-service. While much prior work has focused on mitigating the performance degradation due to inter-application interference, there is lit tle work on estimating slowdown of individual applications in a multi-programmed environment. Dur goal in th is work is to build such an estimation scheme.To this end, we present our simple Memory-Interference induced Slowdown Estimation (MISE) model that estimates slowdowns caused by memory interference. We build our model based on two observations. First, the performance of a memory bound application is roughly proportional to the rate at which its memory requests are served, suggesting that request-service rate can be used as a proxy for performance. Second, when an application's requests are prioritized over all other applica tions' requests, the application experiences very little interfer ence from other applications. This provides a means for esti mating the uninterfered request-service-rate of an application while it is run alongside other applications. Using the above observations, our model estimates the slowdown of an applica tion as the ra tio of its uninterfered and interfered request service rates. We propose simple changes to the above model to estimate the slowdown of non-memory-bound applications.We demonstrate the effectiveness of our model by develop ing two new memory scheduling schemes: 1) one that provides soft quality-of-service guarantees and 2) another that explicitly attempts to minimize maximum slowdown (i.e., unfairness) in the system. Evaluations show that our techniques perform sig nificantly better than state-of-the-art memory scheduling ap proaches to address the above problems.
Abstract-In a multicore system, applications running on different cores interfere at main memory. This inter-application interference degrades overall system performance and unfairly slows down applications. Prior works have developed applicationaware memory request schedulers to tackle this problem. State-ofthe-art application-aware memory request schedulers prioritize memory requests of applications that are vulnerable to interference, by ranking individual applications based on their memory access characteristics and enforcing a total rank order.In this paper, we observe that state-of-the-art applicationaware memory schedulers have two major shortcomings. First, ranking applications individually with a total order based on memory access characteristics leads to high hardware cost and complexity. Second, ranking can unfairly slow down applications that are at the bottom of the ranking stack. To overcome these shortcomings, we propose the Blacklisting Memory Scheduler (BLISS), which achieves high system performance and fairness while incurring low hardware cost and complexity. BLISS design is based on two new observations. First, we find that, to mitigate interference, it is sufficient to separate applications into only two groups, one containing applications that cause interference and another containing applications vulnerable to interference, instead of ranking individual applications with a total order. Vulnerable-to-interference group is prioritized over the interference-causing group. Second, we show that this grouping can be efficiently performed by simply counting the number of consecutive requests served from each applicationan application that has a large number of consecutive requests served is dynamically classified as interference-causing.We evaluate BLISS across a wide variety of workloads and system configurations and compare its performance and complexity with five state-of-the-art memory schedulers. Our evaluations show that BLISS achieves 5% better system performance and 25% better fairness than the best-performing previous memory scheduler while greatly reducing critical path latency and hardware area cost of the memory scheduler (by 79% and 43%, respectively).
Abstract-In a multicore system, applications running on different cores interfere at main memory. This inter-application interference degrades overall system performance and unfairly slows down applications. Prior works have developed application-aware memory request schedulers to tackle this problem. State-of-the-art application-aware memory request schedulers prioritize memory requests of applications that are vulnerable to interference, by ranking individual applications based on their memory access characteristics and enforcing a total rank order.In this paper, we observe that state-of-the-art application-aware memory schedulers have two major shortcomings. First, such schedulers trade off hardware complexity in order to achieve high performance or fairness, since ranking applications individually with a total order based on memory access characteristics leads to high hardware cost and complexity. Such complexity could prevent the scheduler from meeting the stringent timing requirements of state-of-the-art DDR protocols. Second, ranking can unfairly slow down applications that are at the bottom of the ranking stack, thereby sometimes leading to high slowdowns and low overall system performance. To overcome these shortcomings, we propose the Blacklisting Memory Scheduler (BLISS), which achieves high system performance and fairness while incurring low hardware cost and complexity. BLISS design is based on two new observations. First, we find that, to mitigate interference, it is sufficient to separate applications into only two groups, one containing applications that are vulnerable to interference and another containing applications that cause interference, instead of ranking individual applications with a total order. Vulnerable-to-interference group is prioritized over the interference-causing group. Second, we show that this grouping can be efficiently performed by simply counting the number of consecutive requests served from each application -an application that has a large number of consecutive requests served is dynamically classified as interference-causing.We evaluate BLISS across a wide variety of workloads and system configurations and compare its performance and hardware complexity (via RTL implementations), with five state-of-the-art memory schedulers. Our evaluations show that BLISS achieves 5% better system performance and 25% better fairness than the best-performing previous memory scheduler while greatly reducing critical path latency and hardware area cost of the memory scheduler (by 79% and 43%, respectively), thereby achieving a good trade-off between performance, fairness and hardware complexity.
Variation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-processor temperature-induced variation. We empirically demonstrate a new form of variation that exists within a real DRAM chip, induced by the design and placement of different components in the DRAM chip: different regions in DRAM, based on their relative distances from the peripheral structures, require different minimum access latencies for reliable operation. In particular, we show that in most real DRAM chips, cells closer to the peripheral structures can be accessed much faster than cells that are farther. We call this phenomenon design-induced variation in DRAM. Our goals are to i) understand design-induced variation that exists in real, stateof-the-art DRAM chips, ii) exploit it to develop low-cost mechanisms that can dynamically find and use the lowest latency at which to operate a DRAM chip reliably, and, thus, iii) improve overall system performance while ensuring reliable system operation.To this end, we first experimentally demonstrate and analyze designed-induced variation in modern DRAM devices by testing and characterizing 96 DIMMs (768 DRAM chips). Our characterization identifies DRAM regions that are vulnerable to errors, if operated at lower latency, and finds consistency in their locations across a given DRAM chip generation, due to design-induced variation. Based on our extensive experimental analysis, we develop two mechanisms that reliably reduce DRAM latency. First, DIVA Profiling uses runtime profiling to dynamically identify the lowest DRAM latency that does not introduce failures. DIVA Profiling exploits designinduced variation and periodically profiles only the vulnerable regions to determine the lowest DRAM latency at low cost. It is the first mechanism to dynamically determine the lowest latency that can be used to operate DRAM reliably. DIVA Profiling reduces the latency of read/write requests by 35.1%/57.8%, respectively, at 55℃. Our second mechanism, DIVA Shuffling, shuffles data such that values stored in vulnerable regions are mapped to multiple error-correcting code (ECC) codewords. As a result, DIVA Shuffling can correct 26% more multi-bit errors than conventional ECC. Combined together, our two mechanisms reduce read/write latency by 40.0%/60.5%, which translates to an overall system performance improvement of 14.7%/13.7%/13.8% (in 2-/4-/8-core systems) across a variety of workloads, while ensuring reliable operation.arXiv:1610.09604v2 [cs.AR] 15 May 2017 * We report the manufacturing date in a year-week (yy-ww) format. For example, 15-01 means that the DIMM was manufactured during the first week of 2015. † We report two representative timing factors: F r eq (the data transfer frequency per pin) and t RC (the row access cycle time). ‡ The maximum DRAM chip size supported by our testing platform is 2Gb. § We report the DRAM die versions that are marked on the chip package. Since the die version changes when the DR...
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.