Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2015
2015
2016
2016

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 25 publications
references
References 10 publications
0
10
0
Order By: Relevance
“…Hardware support can alleviate the timestamp allocation bottleneck. For example, Tilera processors support remote atomic opera-tions [14] that can increment the timestamp counter without incurring extra cache coherence traffic [2,18]. In practice, this achieves 100 million timestamps per second [11].…”
Section: Timestamp Allocationmentioning
confidence: 99%
See 1 more Smart Citation
“…Hardware support can alleviate the timestamp allocation bottleneck. For example, Tilera processors support remote atomic opera-tions [14] that can increment the timestamp counter without incurring extra cache coherence traffic [2,18]. In practice, this achieves 100 million timestamps per second [11].…”
Section: Timestamp Allocationmentioning
confidence: 99%
“…Prior work has proposed hardware and software techniques to increase timestamp allocation throughput, but both approaches have serious limitations. On the hardware side, centralized asynchronous counters [37], remote atomic memory operations [2,18], and fully-synchronized clocks [19] alleviate the timestamp allocation bottleneck, but they are challenging to implement and are not available in current systems. On the software side, coarse-grained timestamp epochs with group commit [35] reduces the frequency of timestamp allocations, but still limits concurrency in common scenarios as we show later.…”
Section: Introductionmentioning
confidence: 99%
“…The NYU Ultracomputer [29] proposed implementing atomic fetch-and-add using adders in network switches, which could coalesce multiple requests on their way to memory. The Cray T3D [34], T3E [57], and SGI Origin [42] implemented RMOs at the memory controllers, while TilePro64 [30] and recent GPUs [63] implement RMOs in shared caches. Prior work has also proposed adding caches to memory controllers to accelerate RMOs [68] and data-parallel RMOs [5].…”
Section: Hardware Techniquesmentioning
confidence: 99%
“…In hardware, prior work has mainly focused on remote memory operations (RMOs) [29,30,57,68]. RMO schemes send updates to a single memory controller or shared cache bank instead of having the line ping-pong among multiple private caches, as shown in Fig.…”
Section: Introductionmentioning
confidence: 99%
“…Remote word access has been studied in the context of locality-aware directory coherence [18]. Remote atomic operation has been implemented on Tilera processors [19], [20]. Allowing data accesses or computations to happen remotely can reduce the coherence messages and thus improve performance [21].…”
Section: E Extension: Remote Word Accessmentioning
confidence: 99%