Measuring the effects of thread placement on the Kendall Square KSR1

Apon, Amy; Wagner, Thomas; Smirni, Evgenia; Madhukar, Manish; Dowdy, Lawrence W.

doi:10.2172/10183335

Search citation statements

Order By: Relevance

Paper Sections

Select...

Introduction1

Relevant Work1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

1994

1997

Publication Types

Select...

Other3

Article2

Relationship

Self Cite0

Independent5

Authors

Journals

Cited by 5 publications

(2 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have observed that reads leading to coherence misses on write-invalidate architectures (interprocess reads) occur in bursts in simple programs like sort and complex application programs like the Splash benchmarks [12]. We also have verified the suggestion of [15] and [9] that programs with identical average read sharing characteristics can vary significantly in execution time due to differences in sharing at small time scales. This work involved a cache-only memory architecture that included a form of cache update called read-broadcast.…”

Section: Introductionmentioning

confidence: 66%

A Metric for the Temporal Characterization of Parallel Programs

Rodriguez

Jordan

Alaghband

1997

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 66%

A Metric for the Temporal Characterization of Parallel Programs

Rodriguez

Jordan

Alaghband

1997

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

“…Wagner et al [7] studied the impact of thread placement on performance using a set of synthetic benchmarks. They conducted experiments on a two-ring 64-processor system.…”

Section: Relevant Workmentioning

confidence: 99%

Latency hiding on COMA multiprocessors

Abdelrahman

1996

J Supercomput

View full text Add to dashboard Cite

Cache Only Memory Access (COMA) multiprocessors support scalable coherent shared memory with a uniform memory access programming model. The cache-based organization of memory results in long memory access latencies. Latency hiding mechanisms can reduce effective memory latency by making data present in a processor's local memory by the time the data is needed. In this paper, we study the effectiveness of latency hiding mechanisms on the KSR2 multiprocessor in improving the performance of three programs. The communication patterns of each program are analyzed and mechanisms for latency hiding are applied. Results from a 52-processor system indicate that the use of these mechanisms hides a significant portion of remote memory accesses and that application performance benefits. The overhead associated with the use of these mechanisms can limit the extent of this benefit.

show abstract