37th International Symposium on Microarchitecture (MICRO-37'04)
DOI: 10.1109/micro.2004.5
|View full text |Cite
|
Sign up to set email alerts
|

Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Abstract: Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new instructions that are executed on specialized function units. Collapsing the subgraphs simultaneously reduces the length of computation as well as the number of intermediate results stored in the register file. The main problem with this approach is that a new processor must be generated for each application domain. While … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
115
0
1

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 154 publications
(131 citation statements)
references
References 37 publications
0
115
0
1
Order By: Relevance
“…The new accelerator is also ideal for applications that do not operate on 32-bit inputs, such as video compression applications. CCA, as proposed in [8], is used as a baseline accelerator in this work. In essence, CCA is combinational acyclic accelerator consisting of a set of function units organized as a matrix.…”
Section: Design Of a Reduced Ccamentioning
confidence: 99%
See 3 more Smart Citations
“…The new accelerator is also ideal for applications that do not operate on 32-bit inputs, such as video compression applications. CCA, as proposed in [8], is used as a baseline accelerator in this work. In essence, CCA is combinational acyclic accelerator consisting of a set of function units organized as a matrix.…”
Section: Design Of a Reduced Ccamentioning
confidence: 99%
“…In other words, this type of accelerator can be treated as an addition to the execution stage in the main pipeline, but with more computational power. Figures 1 and 2 show a CCA, proposed in [8], and how it is integrated into the pipeline from a high level. This extra FU can execute dataflow subgraphs of instructions in one cycle.…”
Section: Pipeline Organizationmentioning
confidence: 99%
See 2 more Smart Citations
“…By properly configuring these muxes, it is possible to make the compound circuit behave as any of the original individual circuits. However, this configurable logic and the additional links are kept to a minimum; as a result, there is no generic routing network such as in traditional FPGA, or in coarser grain configurable accelerator like in [11,27,5] where all possible connections between operators have to be implemented with an expensive network of switches or multiplexers. Previous research proposed to merge more than one hardware datapath to map several specialized instructions on the same hardware such as in [7,28,21] in the context of ASIP design but none of them leverage both data flow and control flow to allow full loop accelerator aggregation.…”
Section: Introduction and Related Workmentioning
confidence: 99%