Characterization and modeling of multicast communication in cache-coherent manycore processors

Abadal, Sergi; Martínez, Rosario de Vicente; Solé-Pareta, Josep; Alarcón, Eduard; Cabellos‐Aparicio, Albert

doi:10.1016/j.compeleceng.2015.12.018

Cited by 17 publications

(22 citation statements)

References 25 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Data is generally distributed (and potentially shared) among a larger number of cores, causing coherence transactions to be more frequent and to involve a larger destination set [33], [34]. This implies that the multicast traffic per instruction increases with the system size for virtually any coherence protocol or interconnect, as shown in Figure 2a, which assumes a tiled architecture with private 32-kB L1-D/L1-I caches, 512-kB of shared L2 per core and three coherence protocols [8]. Results are the geometric mean of all the SPLASH-2 and PARSEC benchmarks.…”

Section: Motivation and Related Workmentioning

confidence: 99%

“…To provide hints of performance in more realistic scenarios, we later perform a sensitivity analysis considering traffic bursty and hotspot traffic, which is found in most cache-coherent applications for communications in general [60] and multicast in particular [8]. To generate bursty traffic, we alternate ON/OFF periods, the length of which follows Pareto distributions defined by the Hurst exponent H [61].…”

Section: Traffic Generationmentioning

confidence: 99%

“…To model hotspot traffic, we use a gaussian parameter σ which takes values between 0 (concentrated) and ∞ (spread out) and describes the percentage of load that is assigned to each node [60]. More details can be found in [8].…”

Section: Traffic Generationmentioning

confidence: 99%

“…Cache coherency, arguably the main source of on-chip communication in shared memory multiprocessors, is currently implemented through directorybased schemes that use multicast to invalidate cache blocks on a shared write. Absorbing the increase of multicast requirements inherent to application scaling [8] or produced by imprecise tracking techniques [9] comes at the cost of increased latency, extra storage overhead or higher protocol complexity. To avoid these trade-offs, alternative schemes eliminate the restrictions imposed by full-bit directories and make intensive use of broadcast instead.…”

Section: Introductionmentioning

confidence: 99%

“…Multicast traffic as a function of the number of cores for three coherence schemes. We refer the reader to [8] for simulation details.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Scalability of Broadcast Performance in Wireless Network-on-Chip

Abadal

Mestres

Nemirovsky

et al. 2016

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

Abstract-Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. For hundreds or thousands of cores, though, conventional NoCs may not suffice to fulfill the increasing on-chip communication requirements given that the performance of such networks actually drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.

show abstract

Section: Motivation and Related Workmentioning

confidence: 99%