Cache-Based Synchronization in Shared Memory Multiprocessors

Ramachandran, Umakishore; Lee, Joonwon

doi:10.1006/jpdc.1996.0002

Cited by 9 publications

(5 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their work combines lock synchronization with the cache coherency protocol, which requires extra states in the cache controller [3]. For instance, the lock variables and the state information of these lock variables have to be kept in the cache lines, which requires a larger cache tag and brings a further complexity in the cache/memory system design.…”

Section: Background and Motivationmentioning

confidence: 99%

“…Moreover, these different methods cause useful bus cycles to be wasted because of hold cycles. Hold cycles can be described as the cache response time due to simultaneous cache invalidations in case of a lock release [8,7,3]. Therefore, the efficiency of these techniques is dependent on the application program characteristics and the architecture, such as how frequent the locking attempts occur in the application (i.e., how many CSes exist in the program) or whether there are lots of processors making use of locks.…”

Section: Background and Motivationmentioning

confidence: 99%

See 1 more Smart Citation

A system-on-a-chip lock cache with task preemption support

Akgul

Lee

Mooney

2001

Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems - CASES '01

View full text Add to dashboard Cite

Intertask/interprocess synchronization overheads may be significant in a multiprocessor-shared memory System-on-a-Chip implementation. These overheads are observed in terms of lock latency, lock delay and memory bandwidth consumption in the system. It has been shown that a hardware solution brings a much better performance improvement than the synchronization algorithms developed in software [3]. Our previous work presented a SoC Lock Cache (SoCLC) hardware mechanism which resolves the Critical Section (CS) interactions among multiple processors and improves the performance criteria in terms of lock latency, lock delay and bandwidth consumption in a shared memory multiprocessor SoC for short CSes [1]. This paper extends our previous work to support long CSes as well. This combined support involves modifications both in the RTOS kernel level facilities (such as support for preemptive versus non-preemptive synchronization, interrupt handling and RTOS initialization) and in the hardware mechanism. The worst-case simulation results of a database application model with client-server pair of tasks on a fourprocessor system showed that our mechanism achieved a 57% improvement in lock latency, 14% speed up in lock delay and a 35% overall speedup in total execution time.

show abstract

Section: Background and Motivationmentioning

confidence: 99%

Section: Background and Motivationmentioning

confidence: 99%

A system-on-a-chip lock cache with task preemption support

Akgul

Lee

Mooney

2001

Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems - CASES '01

View full text Add to dashboard Cite

show abstract

“…Some researchers have also proposed the cache memory with synchronization [12,14]. However, they are special mechanisms to the mutual exclusion using the lock and the unlock.…”

Section: Rerated Workmentioning

confidence: 98%

Coherence Maintenances to realize an efficient parallel processing for a Cache Memory with Synchronization on a Chip-Multiprocessor

Yamawaki

Iwane

8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05)

View full text Add to dashboard Cite

A chip-multiprocessor is one of the promising architectures that can overcome the ILP limitation, high power consumption and high heating that current processors face. On a shared memory multiprocessor, a performance improvement relies on an efficient communication and synchronization method via shared variables. The TSVM cache combines communication and synchronization with the coherence maintenance on a chip-multiprocessor. That is, the communication and synchronization via shared variables are realized by one coherence transaction through a highspeed on chip inter-connection. The TSVM cache provides several instructions that each instruction has the individual coherence maintenance scheme. The combinations of these instructions can realize the producer-consumers synchronization, mutual exclusion and barrier synchronization with communication easily and systematically. This paper describes how those instructions construct three primitives and shows effect of these primitives using a clock cycleaccurate simulator written in VHDL. The result shows that the TSVM cache can improve a performance of 9.8 times compared with a traditional cache memory, and improve a performance of 2 times compared with a conventional cache memory with synchronization mechanism.

show abstract

“…A large suit of communication mechanisms has been compared for producer-consumer patterns in [71], including prefetching, deliver [41], write-send [40], update-based coherence, selective updates [42], cache-based locks [72,73], and a cache-based message passing scheme called streamline [74]. Streamline is found to perform best on benchmarks with regular communication patterns.…”

Section: Coherent Shared Memory Optimizationsmentioning

confidence: 99%

Direct communication and synchronization mechanisms in chip multiprocessors

Καββαδίας¹

View full text Add to dashboard Cite

Bibliography ContentsTowards Many-core Processors communication mechanisms. Associating the network interface with on-chip memory, allows it to flexibly handle transfers of a few bytes up to several kilobytes. It also allows for processor decoupled (or asynchronous) network interface operation, that can overlap bulk transfers with computation to inexpensively hide latencies, without the need for non-blocking caches. A simple DMA engine can support bulk transfers from and into scratchpad memory, without necessitating processor architecture adaptation to data transfer requirements as in the case of vector and out-of-order processors.One additional issue has to be addressed regarding scratchpad memories and network interfaces in the processor environment. Low latency access is indispensable for their utility in the on-chip environment of general purpose many-core processors, thus making prohibitive any interaction with the operating system in the common case. In order to support concurrent and protected access by multiple processes, scratchpads and their associated NI must be accessible at user-level.Protected, user-level access is achievable via memory mapping of resources. In addition, the close coupling of the network interface with the processor can facilitate translation and protection mechanisms in the network interface. Such mechanisms will enable application-space arguments to communication (e.g. virtual addresses for communication endpoints), although circumventing the operating system in the common case. Reversely, receiving transfered data in user-level accessible scratchpad memory, avoids the need for copying between kernel and user memory space.Caches and user-level accessible scratchpads, utilizing the organization of figure 1.1(b), exploit the advantage that computation occurs "in-place", in the same memory where data are fetched to, without copying. This advantage occurs naturally, because the memory used for computation is also the "communication memory" managed by the cache controller or the network interface. This thesis advocates a virtualized network interface closely-coupled to the processor, that supports fast local data access and communication initiation at userlevel, allows software-controlled data transfer and placement, and exploits NI memory for computation.

show abstract

Cache-Based Synchronization in Shared Memory Multiprocessors

Cited by 9 publications

References 45 publications

A system-on-a-chip lock cache with task preemption support

A system-on-a-chip lock cache with task preemption support

Coherence Maintenances to realize an efficient parallel processing for a Cache Memory with Synchronization on a Chip-Multiprocessor

Direct communication and synchronization mechanisms in chip multiprocessors

Contact Info

Product

Resources

About