Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Clark, Nathan; Kudlur, Manjunath; Park, Hyunchul; Mahlke, Scott; Flautner, Krisztián

doi:10.1109/micro.2004.5

Cited by 154 publications

(131 citation statements)

References 37 publications

Supporting

Mentioning

115

Contrasting

Unclassified

Order By: Relevance

“…The new accelerator is also ideal for applications that do not operate on 32-bit inputs, such as video compression applications. CCA, as proposed in [8], is used as a baseline accelerator in this work. In essence, CCA is combinational acyclic accelerator consisting of a set of function units organized as a matrix.…”

Section: Design Of a Reduced Ccamentioning

confidence: 99%

“…In other words, this type of accelerator can be treated as an addition to the execution stage in the main pipeline, but with more computational power. Figures 1 and 2 show a CCA, proposed in [8], and how it is integrated into the pipeline from a high level. This extra FU can execute dataflow subgraphs of instructions in one cycle.…”

Section: Pipeline Organizationmentioning

confidence: 99%

“…One problem with pipelining is that it would make latency for all subgraphs longer, significantly impacting performance gains from the CCA. Previous work showed that moving from the original CCA to 2-stage pipelined CCA eliminates around 30% of the performance improvement [8]. Another problem is that pipelining does not reduce the die area of CCA; it would actually make the problem worse due to pipeline registers.…”

Section: Width-aware Narrow Ccamentioning

confidence: 99%

“…This leaves much room for improvement when looking at the problem from the applications' perspective. Work by Clark et al [8] took a slightly different approach, designing an accelerator targeting important computation patterns, whether or not they were easily supported in hardware. This accelerator, termed a configurable compute accelerator (CCA), offers the promise of more efficiency if the hardware can be constructed.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Hormati

Clark

Mahlke

2007

International Symposium on Code Generation and Optimization (CGO'07)

View full text Add to dashboard Cite

show abstract

Section: Design Of a Reduced Ccamentioning

confidence: 99%

Section: Pipeline Organizationmentioning

confidence: 99%

Section: Width-aware Narrow Ccamentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Hormati

Clark

Mahlke

2007

International Symposium on Code Generation and Optimization (CGO'07)

View full text Add to dashboard Cite

show abstract

“…By properly configuring these muxes, it is possible to make the compound circuit behave as any of the original individual circuits. However, this configurable logic and the additional links are kept to a minimum; as a result, there is no generic routing network such as in traditional FPGA, or in coarser grain configurable accelerator like in [11,27,5] where all possible connections between operators have to be implemented with an expensive network of switches or multiplexers. Previous research proposed to merge more than one hardware datapath to map several specialized instructions on the same hardware such as in [7,28,21] in the context of ASIP design but none of them leverage both data flow and control flow to allow full loop accelerator aggregation.…”

Section: Introduction and Related Workmentioning

confidence: 99%

Reconciling specialization and flexibility through compound circuits

Yehia

Girbal

Berry

et al. 2009

2009 IEEE 15th International Symposium on High Performance Computer Architecture

View full text Add to dashboard Cite

Custom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit

Mehdipour

Noori

Zamani

et al. 2006

Embedded and Ubiquitous Computing

View full text Add to dashboard Cite

Extracting appropriate custom instructions is an important phase for implementing an application on an extensible processor with a reconfigurable functional unit (RFU). Custom instructions (CIs) are usually extracted from critical portions of applications. It may not be possible to meet all of the RFU constraints when CIs are generated. This paper addresses the generation of mappable CIs on an RFU. In this paper, our proposed RFU architecture for an adaptive dynamic extensible processor is described. Then, an integrated framework for temporal partitioning and mapping is presented to partition and map the CIs on RFU. In this framework, two mapping aware temporal partitioning algorithms are used to generate CIs. Temporal partitioning iterates and modifies partitions incrementally to generate CIs. Using this framework brings about more speedup for the extensible processor.

show abstract

Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Cited by 154 publications

References 37 publications

Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Reconciling specialization and flexibility through compound circuits

Custom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit

Contact Info

Product

Resources

About