Code generation for fixed-point DSPs

Araújo, Guido; Malik, Sharad

doi:10.1145/290833.290837

Cited by 20 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have also given general, tight expressions for the CBP parameters of a number of additional practical DSP building blocks, which were obtained by analyzing implementations in the DSP libraries provided within the Ptolemy design environment [32]. Useful directions for further study include investigating tools to help automate the derivation of tight CBP parameters; integrating CBP-based buffering analysis, multidimensional dataflow modeling [34], and cyclo-static dataflow principles [26], which appear to have strong synergistic inter-relationships; systematically accounting for CBP parameters in the context of memory bound derivation (derivations of efficiently-computable upper bounds on memory requirements) [8]; and understanding the impact of CBP-based buffer optimization on retiming/vectorization transformations [35][36][37] for throughput optimization under memory capacity constraints. …”

Section: Discussionmentioning

confidence: 99%

“…Similarly, a compiler for a general-purpose HLL (such as C) typically does not have the global information about application structure that our allocator has. The techniques we develop in this paper are thus complementary to the work that is being done on developing better HLL compilers for DSPs (e.g., see [8][9][10][11][12]). In particular, the techniques we develop operate on the graphs at a high enough level that particular architectural features of the target processor are largely irrelevant.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The CBP Parameter: A Module Characterization Approach for DSP Software Optimization

Bhattacharyya

Murthy

2004

The Journal of VLSI Signal Processing-Systems for Signal, Image

View full text Add to dashboard Cite

Abstract. Memory consumption is an important metric for DSP software implementation. In this paper, we develop a module characterization technique that promotes more economical use of memory resources at the system level. Our work is developed in the context of software synthesis from signal/video/image processing applications expressed as synchronous dataflow (SDF) graphs. SDF is a restricted form of dataflow where each computational module (actor) consumes and produces a fixed number of data values (tokens) on each execution. Usually, no assumption is made about when during the execution of an actor, the tokens are actually consumed and produced; the firing of an actor is treated as an atomic event for most purposes. However, we show in this paper that it is possible to concisely and precisely capture key properties pertaining to the relative times at which tokens are produced and consumed by an actor. We show this by introducing the consumed-before-produced (CBP) parameter, which provides a general method for characterizing the token transfer of an SDF actor. Good bounds on the CBP parameter can aid an SDF compiler in performing more aggressive optimizations for reducing buffer sizes on the edges between actors. We formally define the CBP parameter; derive some useful properties of this parameter; illustrate how the value of the parameter is derived by examining in detail the multirate FIR filter, which is a fundamental actor in multirate signal processing applications; and examine CBP parameterizations for several other practical SDF actors.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The CBP Parameter: A Module Characterization Approach for DSP Software Optimization

Bhattacharyya

Murthy

2004

The Journal of VLSI Signal Processing-Systems for Signal, Image

View full text Add to dashboard Cite

show abstract

“…One prominent example of a compiler study targeting DSPs may be that of Araujo and Malik [1998], who proposed a linear-time optimal algorithm for instruction selection, register allocation, and instruction scheduling for expression trees. Like most other previous studies for DSPs, their algorithm was not designed specifically for the multimemory bank DSPs.…”

Section: Previous Workmentioning

confidence: 99%

Fast memory bank assignment for fixed-point digital signal processors

Cho

Paek

Whalley

2004

ACM Trans. Des. Autom. Electron. Syst.

View full text Add to dashboard Cite

Most vendors of digital signal processors (DSPs) support a Harvard architecture, which has two or more memory buses, one for program and one or more for data and allow the processor to access multiple words of data from memory in a single instruction cycle. Also, many existing fixed-point DSPs are known to have an irregular architecture with heterogeneous registers, which contains multiple register files that are distributed and dedicated to different sets of instructions. Although there have been several studies conducted to efficiently assign data to multimemory banks, most of them assumed processors with relatively simple, homogeneous general-purpose registers. Thus, several vendor-provided compilers for DSPs that we examined were unable to efficiently assign data to multiple data memory banks, thereby often failing to generate highly optimized code for their machines. As a consequence, programmers for these DSPs often manually assign program variables to memories so as to fully utilize multimemory banks in their code. This paper reports on our recent attempt to address this problem by presenting an algorithm that helps the compiler to efficiently assign data to multimemory banks. Our algorithm differs from previous work in that it assigns variables to memory banks in separate, decoupled code generation phases, instead of a single, tightly coupled phase. The experimental results have revealed that our decoupled algorithm greatly simplifies our code generation process; thus our compiler runs extremely fast, yet generates target code that is comparable in quality to the code generated by a coupled approach.

show abstract

“…For instance, according to [Zivojnovic 1994], about 55% of instructions in the code for the Motorola DSP56k processor are copies, which are unusually high, as compared to the case of GPPs. This fact indicates that the code quality in a heterogeneous register architecture would rely on how efficiently to minimize such copy instructions [Araujo 1998]. Unfortunately, the register coalescing problem is more complex for the heterogeneous register architecture than the homogeneous one.…”

Section: Introductionmentioning

confidence: 99%

Register coalescing techniques for heterogeneous register architecture with copy sifting

Ahn

Paek

2009

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Optimistic coalescing has been proven as an elegant and effective technique that provides better chances of safely coloring more registers in register allocation than other coalescing techniques. Its algorithm originally assumes homogeneous registers, which are all gathered in the same register file. Although this register architecture is still common in most general-purpose processors, embedded processors often contain heterogeneous registers, which are scattered in physically different register files dedicated for each dissimilar purpose and use. In this work, we show that optimistic coalescing is also useful for an embedded processor to better handle such heterogeneity of the register architecture, and developed a modified algorithm for optimal coalescing that helps a register allocator. In the experiment, an existing register allocator was able to achieve up to 13.0% reduction in code size through our coalescing, and avoid many spills that would have been generated without our scheme. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors-Code generation, compiler and optimizationGeneral Terms: Algorithms, Performance, Design, Experimentation Additional Key Words and Phrases: Register allocation, register coalescing, compiler, embedded processors, heterogeneous register architecture Part of this work was published in LCTES 2007. New contributions added to this article include Section 4.3, which describes the coloring heuristic which was applied to our modified optimistic coalescing; Sections 4.4 and 4.5, which propose two new techniques for further reducing spills in our modified optimistic coalescing; and Section 5, which extensively analyzes the impact of our coalescing technique with fuller benchmark codes.

show abstract

Code generation for fixed-point DSPs

Cited by 20 publications

References 24 publications

The CBP Parameter: A Module Characterization Approach for DSP Software Optimization

The CBP Parameter: A Module Characterization Approach for DSP Software Optimization

Fast memory bank assignment for fixed-point digital signal processors

Register coalescing techniques for heterogeneous register architecture with copy sifting

Contact Info

Product

Resources

About