Monsoon: a dataflow computing architecture suitable for intelligent control

Papadopoulos, Gregory M.

doi:10.1109/isic.1990.128471

Cited by 2 publications

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Token Dataflow architectures were the subject of academic studies in the early 1990s e.g. [7], [2], [4], [?]. However, due to the emergence of the killer microprocessors, these designs and ideas were largely relegated to academic projects.…”

Section: Background a Token Dataflow Architecturementioning

confidence: 99%

Fanout decomposition dataflow optimizations for FPGA-based Sparse LU factorization

Siddhartha

Kapre

2014

2014 International Conference on Field-Programmable Technology (FPT)

View full text Add to dashboard Cite

Performance of FPGA-based token dataflow architectures is often limited by the long tail distribution of parallelism in the compute paths of dataflow graphs. This is known to limit speedup of dataflow processing of Sparse LU factorization to only 3-10× over CPUs. In this paper, we show how to overcome these limitations by exploiting criticality information along compute paths; both statically during graph pre-processing and dynamically at runtime. We statically restructure the high-fanin dataflow chains using a technique inspired by Huffman encoding where we provide faster routes for late arriving inputs as predicted through our timing models. We also perform a fanout decomposition and selective node replication in order to distribute serialization costs across multiple PEs. This static restructuring overhead is small; roughly the cost of a single iteration, and is amortized across 1000s of LU iterations at runtime. Additionally, we modify the dataflow firing rule in hardware to prefer critical nodes when multiple nodes are ready for dataflow evaluation. We compute this criticality offline through a one-time slack analysis and implement this in hardware at virtually no cost through a trivial address encoding ordered by criticality. For dataflow graphs extracted for sparse LU factorization, we demonstrate up to 2.5× (mean 1.21×) improvement when using the static preprocessing alone, a 2.4× (mean 1.17×) improvement when using only runtime optimizations alone while an overall 2.9× (mean 1.39×) improvement when both static and runtime optimizations are enabled across a range of benchmark problems.

show abstract

Section: Background a Token Dataflow Architecturementioning

confidence: 99%