2006
DOI: 10.1145/1150019.1136513
|View full text |Cite
|
Sign up to set email alerts
|

Area-Performance Trade-offs in Tiled Dataflow Architectures

Abstract: Tiled architectures, such as RAW, SmartMemories, TRIPS, and WaveScalar, promise to address several issues facing conventional processors, including complexity, wire-delay, and performance. The basic premise of these architectures is that larger, higher-performance implementations can be constructed by replicating the basic tile across the chip.This paper explores the area-performance trade-offs when designing one such tiled architecture, WaveScalar. We use a synthesizable RTL model and cycle-level simulator to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0
5

Year Published

2006
2006
2014
2014

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(20 citation statements)
references
References 28 publications
0
15
0
5
Order By: Relevance
“…The TRIPS architecture [21], [20] is an instantiation of an EDGE ISA which utilizes large cores consisting of a matrix of execution units. In [22], the authors explore the area-performance trade-offs of a tiled data-flow architecture. A tiled architecture promises to address several issues facing conventional processors, such as complexity and performance.…”
Section: Related Workmentioning
confidence: 99%
“…The TRIPS architecture [21], [20] is an instantiation of an EDGE ISA which utilizes large cores consisting of a matrix of execution units. In [22], the authors explore the area-performance trade-offs of a tiled data-flow architecture. A tiled architecture promises to address several issues facing conventional processors, such as complexity and performance.…”
Section: Related Workmentioning
confidence: 99%
“…The WaveScalar processor has a similar philosophy and execution model as TRIPS, but uses a hierarchy of interconnection networks to pass operands between processing elements [9]. Operands are broadcast within the eight processing elements making up one domain.…”
Section: Related Workmentioning
confidence: 99%
“…We scheduled nine sample applications from the Spec2000 [35] and Splash2 [4] benchmark suites (art, equake, gzip, mcf, radix, twolf and fft, lu, ocean, respectively). 2 The cycle-level simulator used for this study is tuned to match the latencies, resources, and restrictions of an RTL implementation [37] of the architecture. Table 2 shows the average performance of each of these nine schedules.…”
Section: Experimental Evaluationmentioning
confidence: 99%
“…A simple PE decreases both design and verification time; PE replication provides robustness in the face of fabrication errors; and the combination reduces wire delay for both data and control signal transmission. The result is a scalable architecture that allows a chip designer to target different levels of performance, with different area budgets [37].…”
Section: Introductionmentioning
confidence: 99%