In this paper we examine the application of offline algorithms for determining the optimal sequence of loads and superloads (a load of multiple consecutive cache lines) for direct-mapped caches. We evaluate potential gains in terms of miss rate and bandwidth and find that in many cases optimal superloading can noticeably reduce the miss rate without appreciably increasing bandwidth. Then we examine how this performance potential might be realized. We examine the effectiveness of a dynamic online algorithm and of static analysis (profiling) for superloading and compare these to next-line prefetching. Experimental results show improvements comparable to those of the optimal algorithm in terms of miss rates. §1 IntroductionSince their introduction over thirty years ago, caches have become ubiquitous as components of the memory hierarchy. Caches have been successful because programs exhibit locality: spatial locality, the tendency for neighboring memory locations to be referenced close together in time; and temporal locality, the tendency for referencing in the future those locations that have been referenced in the recent past. However, as the speed of processors increases much faster than the decrease in memory latency, the efficiency of caches has received more scrutiny.Combinations of hardware and software techniques have been proposed and often implemented to improve locality and to reduce or tolerate memory latency. The basic goal is to reduce cache miss rates without unduly increasing the amount of bytes transferred between levels of the memory hierarchy. When couched in terms of improving spatial locality for data caches, the main theme of this paper, the usual policy is to support larger cache lines. Potential detrimental effects of this policy are a possible increase in cache miss rate because of more frequent conflict misses and the lack of reuse of portions of the larger lines, and to lengthen the occupancy of the bus between levels of the memory hierarchy servicing the miss. In order to palliate these effects and to take advantage of large lines, when deemed profitable, we examine the potential benefit of implementing the cache controller so that on a miss, either the missing regular size line is loaded -hereafter called the base case -or the line is superloaded, i.e., the missing line and surrounding lines are brought into the cache. Note that the advantages of superloading depend on the cost model for the level of the memory hierarchy under investigation. Of particular importance are the relative costs of a load and a superload.Although the impact of these techniques has been investigated using heuristics and software or hardware assists, how much can be gained if these techniques were used optimally is not known. In this paper, after briefly introducing an optimal offline algorithm for choosing between loads and superloads, we derive the margin of maximum possible improvement on the integer Spec95 benchmark suite. We then analyze the performances, when compared to the optimal and base cases, ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.