Today, with the increasing popularity of chip multiprocessors (CMPs), the memory wall problem becomes more serious. So, making better use of the shared cache on chip is more necessary on CMP than other multiple processors architecture. In this paper, we analyze the performance of traditional special decomposed parallel implementation of red-black algorithm, and find that this parallel model does not exploit the temporary data locality of this application. Then, we restructure red-black algorithm to be a producer-consumer thread pipeline. Under this thread-level pipeline model, consumer threads can reuse the data that the former producers have fetched into the shared cache. Then the number of cache miss reduces. Our experiment results show the application performance under the thread-level pipeline parallel model achieves about 40% additional improvement on core 2. Furthermore, we propose a synchronization mechanism in hardware to support this model, and discuss the scalability of this parallel model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.