Hybrid access-specific software cache techniques for the cell BE architecture

González, Marc; Vujic, Nikola; Martorell, Xavier; Ayguadé, Eduard; Eichenberger, Alexandre E.; Chen, Tong; Sura, Zehra; Zhang, Tao; O’Brien, Kevin; O’Brien, Kathryn

doi:10.1145/1454115.1454156

Cited by 41 publications

(41 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, the compiler has to wrap the potentially incoherent accesses with a piece of code that does a software lookup to check if some SPM has a copy of the data, triggers a fine-grained DMA transfer to bring the data to the SPM of the local core, accesses it, and triggers a DMA transfer back if the data is modified. This solution adds huge overheads, as observed in previous works in the area of software caches [22]. This paper avoids these overheads by allowing all the cores to access all the SPMs and by proposing a hierarchy of directories and filters that efficiently tracks the contents of all SPMs and diverts the potentially incoherent accesses to any SPM of the chip.…”

Section: Spm Management In Hybrid Memory Systemsmentioning

confidence: 97%

“…For a computational loop the code is transformed into a two-nested loop that uses tiling to do the computation [19,20,22,41], as shown in Figure 3. Each iteration of the outermost loop executes three phases: (1) a control phase that maps chunks of the array sections to the SPMs, (2) a synchronization phase that waits for the completion of the DMA transfers, and (3) a work phase that performs the computation for the currently mapped chunks of data.…”

Section: Compiler and Runtime Supportmentioning

confidence: 99%

“…The runtime library that manages the SPMs is a modified version of a software cache for the Cell B.E. [22] that has been ported to the hybrid memory system and optimized. Double buffering is not used.…”

Section: Experimental Frameworkmentioning

confidence: 99%

“…A promising solution to exploit these characteristics in shared memory manycores is to introduce the hybrid memory system and to give the compiler the responsibility of generating code to manage the SPMs, so that the added programming complexity is not exposed to the programmer. Even though compilers succeed in generating code for the SPMs when the computation is based on predictable memory access patterns [22], in the presence of unpredictable memory accesses they encounter important limitations. Due to the incoherence between the SPMs and the cache hierarchy, the compiler cannot generate code for the SPMs if it cannot ensure that there is no aliasing between two memory references that may target copies of the same data in the SPMs and in the cache hierarchy.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures

Alvarez

Vilanova

Moretó

et al. 2015

Proceedings of the 42nd Annual International Symposium on Computer Architecture

Self Cite

View full text Add to dashboard Cite

The increasing number of cores in manycore architectures causes important power and scalability problems in the memory subsystem. One solution is to introduce scratchpad memories alongside the cache hierarchy, forming a hybrid memory system. Scratchpad memories are more power-efficient than caches and they do not generate coherence traffic, but they suffer from poor programmability. A good way to hide the programmability difficulties to the programmer is to give the compiler the responsibility of generating code to manage the scratchpad memories. Unfortunately, compilers do not succeed in generating this code in the presence of random memory accesses with unknown aliasing hazards. This paper proposes a coherence protocol for the hybrid memory system that allows the compiler to always generate code to manage the scratchpad memories. In coordination with the compiler, memory accesses that may access stale copies of data are identified and diverted to the valid copy of the data. The proposal allows the architecture to be exposed to the programmer as a shared memory manycore, maintaining the programming simplicity of shared memory models and preserving backwards compatibility. In a 64-core manycore, the coherence protocol adds overheads of 4% in performance, 8% in network traffic and 9% in energy consumption to enable the usage of the hybrid memory system that, compared to a cache-based system, achieves a speedup of 1.14x and reduces on-chip network traffic and energy consumption by 29% and 17%, respectively.

show abstract

Section: Spm Management In Hybrid Memory Systemsmentioning

confidence: 97%

Section: Compiler and Runtime Supportmentioning

confidence: 99%

Section: Experimental Frameworkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures

Alvarez

Vilanova

Moretó

et al. 2015

Proceedings of the 42nd Annual International Symposium on Computer Architecture

Self Cite

View full text Add to dashboard Cite

show abstract

“…Compilers succeed in generating code for LMs when the computation is based on predictable memory access patterns [7] but, when non-predictable memory access patterns are found, compilers need to ensure correctness by applying complex analyses such as memory aliasing [8], [9], [10]. When compilers cannot ensure that there is no aliasing between two memory references that may target copies of the same data in the LM and in the cache hierarchy, they must conservatively avoid using the LM.…”

Section: Introductionmentioning

confidence: 99%

Hardware-software coherence protocol for the coexistence of caches and local memories

Alvarez¹,

Vilanova²,

González³

et al. 2012

2012 International Conference for High Performance Computing, Networking, Storage and Analysis

Self Cite

View full text Add to dashboard Cite

Abstract-Cache coherence protocols limit the scalability of multicore and manycore architectures and are responsible for an important amount of the power consumed in the chip. A good way to alleviate these problems is to introduce a local memory alongside the cache hierarchy, forming a hybrid memory system. Local memories are more power-efficient than caches and do not generate coherence traffic, but they suffer from poor programmability. When non-predictable memory access patterns are found compilers do not succeed in generating code because of the incoherence between the two storages. This paper proposes a coherence protocol for hybrid memory systems that allows the compiler to generate code even in the presence of memory aliasing problems. Coherence is ensured by a software/hardware co-design where the compiler identifies potentially incoherent memory accesses and the hardware diverts them to the correct copy of the data. The coherence protocol introduces overheads of 0.26% in execution time and of 2.03% in energy consumption to enable the usage of the hybrid memory system, which outperforms cache-based systems by an speedup of 38% and an energy reduction of 27%.

show abstract

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories

Vujic

Alvarez

Tallada

et al. 2010

Languages and Compilers for Parallel Computing

View full text Add to dashboard Cite

Hybrid access-specific software cache techniques for the cell BE architecture

Cited by 41 publications

References 14 publications

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures

Hardware-software coherence protocol for the coexistence of caches and local memories

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories

Contact Info

Product

Resources

About