Large-scale shared-memory multiprocessors typically have long latencies for remote data accesses. A k e y issue for execution performance of many common applications is the synchronization cost. The communication scalability of synchronization has been improved by the introduction of queue-based spin-locks instead of Test&Test&Set. For architectures with long access latencies for global data, attention should also be p aid to the number of global accesses that are involved i n synchronization.We present a method to characterize the performance o f p r oposed queue lock algorithms, and apply it to previously published algorithms. We also present two new queue locks, the LH lock and the M lock. We compare the locks in terms of performance, memory requirements, code size, and required h a r dware support. The LH lock is the simplest of all the locks, yet requires only an atomic swap operation. The M lock is superior in terms of global accesses needed t o p erform synchronization and still competitive in all other criteria. We conclude that the M lock is the best overall queue lock for the class of architectures studied.
ultiprocessors providing a shared memory view to the programmer are typically implemented as such-with a shared memory. We introduce an architecture with large caches to reduce latency and network load. Because all system memory resides in the caches, a minimum number of network accesses are needed. Still, it presents a shared-memory view to the programmer. Single bus. Shared-memory systems based on a single bus have some tens of processors, each one with a local cache, and typically suffer from bus saturation. A cache-coherence protocol in each cache snoops the traffic on the common bus and prevents inconsistencies in cache contents.' Computers manufactured by Sequent and Encore use this kind of architecture. Because it provides a uniform access time to the whole shared memory, it is called a uniform memory architecture (UMA). The contention for the common memory and the common bus limits the scalability of UMAs.
The widenirig n7en7oi-y gap reduces peifomiarice of applications with poor data localiry. Therefore, there is a need for methods to analyze data locality and help application oprimizarion. In tliis paper we present StarCache, o noi,el sonipling-based method for Ijerfomiirig datu-locality ar7alysis on realistic workloads. SfatCache is bused on a probabilistic niodel of the cache, rather t~iar7 a functional cache simulator: It uses Statistics fnirs a sir7gle rioz to uccurutelj estimate miss ratios .f fully-associutA~e caches of arbitrary sizes and generate working-set graplis.We eidirare StatCache irsing the SPEC CPU2000 b e i~l~n~a r k s and show rl7at StatCache gives accurate results wifh a sanipling rate us low as We also provide a proof of-concept iniplementatiorr, and discuss porenriallj very fast inipleriientation alrematives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.