2006
DOI: 10.1007/s10766-006-0015-0
|View full text |Cite
|
Sign up to set email alerts
|

Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

Abstract: Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media registe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2008
2008
2009
2009

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…This phase of the 2-D DWT is difficult to vectorize efficiently because the elements within a register need to be rearranged, incurring substantial overhead. Techniques we are considering include providing support for packed multiply-accumulate instructions for floating-point values (MMX/SSE provides such instructions but only for integer data) and the matrix register file (MRF) [23], which is a (micro-)architectural technique to efficiently support matrix transposition.…”
Section: Discussionmentioning
confidence: 99%
“…This phase of the 2-D DWT is difficult to vectorize efficiently because the elements within a register need to be rearranged, incurring substantial overhead. Techniques we are considering include providing support for packed multiply-accumulate instructions for floating-point values (MMX/SSE provides such instructions but only for integer data) and the matrix register file (MRF) [23], which is a (micro-)architectural technique to efficiently support matrix transposition.…”
Section: Discussionmentioning
confidence: 99%
“…We have evaluated our proposed techniques in a previous paper [Shahbahrami et al 2006a] using some 2-D multimedia kernels, such as 2-D discrete cosine transform (DCT) and its inverse (IDCT), Paeth prediction, 2 × 2 Haar transform and its inverse, vector/matrix multiplication, matrix transpose, and addition of two images. Figure 2 eliminates the matrix transposition step which is required in some kernels, for instance, 2-D (I)DCT and vector/matrix multiplication.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, we discuss the new SIMD instructions and provide a preliminary evaluation of the hardware cost of the proposed techniques. More details about the MMMX architecture can be found in previous work [Shahbahrami et al 2006a[Shahbahrami et al , 2006b[Shahbahrami et al , 2006c.…”
Section: Architecturementioning
confidence: 99%