Utilizing analytical models to evaluate proposals or provide guidance in high-level architecture decisions is been becoming more and more attractive. A certain number of methods have emerged regarding cache behaviors and quantified insights in the last decade, such as the stack distance theory and the memory level parallelism (MLP) estimations. However, prior research normally oversimplified the factors that need to be considered in out-of-order processors, such as the effects triggered by reordered memory instructions, and multiple dependences among memory instructions, along with the merged accesses in the same MSHR entry. These ignored influences actually result in low and unstable precisions of recent analytical models.
By quantifying the aforementioned effects, this article proposes a cache performance evaluation framework equipped with three analytical models, which can more accurately predict cache misses, MLPs, and the average cache miss service time, respectively. Similar to prior studies, these analytical models are all fed with profiled software characteristics in which case the architecture evaluation process can be accelerated significantly when compared with cycle-accurate simulations.
We evaluate the accuracy of proposed models compared with
gem5
cycle-accurate simulations with 16 benchmarks chosen from Mobybench Suite 2.0, Mibench 1.0, and Mediabench II. The average root mean square errors for predicting cache misses, MLPs, and the average cache miss service time are around 4%, 5%, and 8%, respectively. Meanwhile, the average error of predicting the stall time due to cache misses by our framework is as low as 8%. The whole cache performance estimation can be sped by about 15 times versus
gem5
cycle-accurate simulations and 4 times when compared with recent studies. Furthermore, we have shown and studied the insights between different performance metrics and the reorder buffer sizes by using our models. As an application case of the framework, we also demonstrate how to use our framework combined with McPAT to find out Pareto optimal configurations for cache design space explorations.
Stack or reuse distances have been widely adopted in studying memory localities and cache behaviors. However, the memory references, normally profiled by a binary instrumentation tool, only reflect the accessing sequence of instruction fetching and load or store executions. That is why the stack or the reuse distances obtained from these memory references cannot be used to predict the L2 or lower cache misses. This paper proposes a probability model to calculate the L2 reuse distance histogram from the L1 stack distance histograms without any extra simulations. The L2 cache misses or memory localities can be predicted fast and accurately based on the result of our model. We use 13 benchmarks chosen from Mobybench 2.0 and SPEC 2006 to evaluate the accuracy of our model. With the support of StatCache and StatStack, the average absolute error of modeling the L2 cache misses is about 8%. Meanwhile, contrast to gem5 fast simulations, the process of predicting L2 cache misses can be sped up by 50 times on average.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.