Joe Zbiciak scite author profile

2001

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) loops. More recently, software pipelining has been extended to irregular (WHILE) loops, but only on architectures that provide special-purpose hardware such as rotating (predicate and general-purpose) register files, specific instructions for filling/draining software pipelined loops, and possibly hardware support for speculative code motion. In contrast, the TMS320C6000 family has a limited, static register file and no specialized hardware beyond the ability to predicate instructions using a few static registers. In this paper, we describe our experience extending a production compiler for the TMS320C6000 family to software pipeline irregular loops. We discuss our technique for preprocessing irregular loops so that they can be handled by the existing software pipeliner. Our approach is much simpler than previous approaches and works very well in the presence of the DSP applications and the target architecture which characterize our environment. With this optimization, we achieve impressive speedups on several key DSP and non-DSP algorithms.

show abstract

Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

Granston

¹

,

Stotzer

²

,

Zbiciak

³

2001

View full text Add to dashboard Cite

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) loops. More recently, software pipelining has been extended to irregular (WHILE) loops, but only on architectures that provide special-purpose hardware such as rotating (predicate and general-purpose) register files, specific instructions for filling/draining software pipelined loops, and possibly hardware support for speculative code motion. In contrast, the TMS320C6000 family has a limited, static register file and no specialized hardware beyond the ability to predicate instructions using a few static registers. In this paper, we describe our experience extending a production compiler for the TMS320C6000 family to software pipeline irregular loops. We discuss our technique for preprocessing irregular loops so that they can be handled by the existing software pipeliner. Our approach is much simpler than previous approaches and works very well in the presence of the DSP applications and the target architecture which characterize our environment. With this optimization, we achieve impressive speedups on several key DSP and non-DSP algorithms.

show abstract

<title>C64x VelociTI.2 extensions support media-rich broadband infrastructure and image analysis systems</title>

Golston

¹

,

Hoyle

²

,

Markandey

³

et al. 2001

3

0

View full text Add to dashboard Cite

show abstract

Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

Granston

¹

,

Stotzer

²

,

Zbiciak

³

2001

SIGPLAN Not.

0

View full text Add to dashboard Cite

The TMS320C6000 architecture is a leading family of Digital Signal Processors (DSPs). To achieve peak performance, this VLIW architecture relies heavily on software pipelining. Traditionally, software pipelining has been restricted to regular (FOR) loops. More recently, software pipelining has been extended to irregular (WHILE) loops, but only on architectures that provide special-purpose hardware such as rotating (predicate and general-purpose) register files, specific instructions for filling/draining software pipelined loops, and possibly hardware support for speculative code motion. In contrast, the TMS320C6000 family has a limited, static register file and no specialized hardware beyond the ability to predicate instructions using a few static registers. In this paper, we describe our experience extending a production compiler for the TMS320C6000 family to software pipeline irregular loops. We discuss our technique for preprocessing irregular loops so that they can be handled by the existing software pipeliner. Our approach is much simpler than previous approaches and works very well in the presence of the DSP applications and the target architecture which characterize our environment. With this optimization, we achieve impressive speedups on several key DSP and non-DSP algorithms.

show abstract