Cache Optimization for H.264/AVC Motion Compensation

Yoon, Seong Yong; Chae, Soo-Ik

doi:10.1093/ietisy/e91-d.12.2902

Cited by 4 publications

(8 citation statements)

References 7 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…by implementing special instructions for the FIFO access. Additionally, we should also decrease T T by the following schemes: (1) adopting a highbandwidth DRAM and an efficient SDRAM controller [4], (2) using a wider memory data bus, and (3) linking multiple DMA transfers for inter-prediction of a macroblock where each DMA transfer is arranged to deliver the minimum number of pixels for its corresponding sub-block of the macroblock in H.264 [6]. Using the scheme (3) can reduce the required memory bandwidth width substantially because the data transfer for inter-prediction occupies 73.4% of the total bandwidth of DMA data transfers, which corresponds to 59.6% of the total bandwidth to the SDRAM.…”

Section: Resultsmentioning

confidence: 99%

Flexible DMA subsystem in multi-core platforms for video applications

Koo

Chae

2010

IEICE Electron. Express

View full text Add to dashboard Cite

Abstract:We propose a flexible DMA subsystem suitable for multicore systems, in which DMA set-up routines are separated from DMA requesting threads and DMA completion flags can quickly be checked by DMA synchronizing threads. We will briefly describe its architecture and implementation. By using a multi-core DSP system with the proposed DMA subsystem, we implemented an H.264/AVC software decoder that can decode D1 30 frames per second when the system operating clock frequency is about 265 MHz, assuming that all cores are operated at the same system clock frequency. With experimental results for the H.264 decoder, we confirmed its flexibility and performance improvement.

show abstract

Section: Resultsmentioning

confidence: 99%

Flexible DMA subsystem in multi-core platforms for video applications

Koo

Chae

2010

IEICE Electron. Express

View full text Add to dashboard Cite

show abstract

“…Since the luma 4 × 4 block represents the most demanding case with respect to memory accesses [3] and computational intensity for q-pel MC, the focus will be put on this type of block and its associated operations to prove the efficiency of the proposed method for a standard H.264 decoder.…”

Section: Problem Definitionmentioning

confidence: 99%

“…Wang [2] and Yoon [3] concluded that MC requires 75% of all memory access in a H.264 decoder, in contrast with only 10% required for storing the frames. This high memory access ratio of the MC module demands for highly optimized memory accesses to improve the total performance of the decoder.…”

Section: Introductionmentioning

confidence: 99%

“…In this representation, all neighboring pixels in an image remain neighbors in the memory also. This is the typical way of saving the reference frame on an external memory, also used in [3][4][5].…”

Section: Introductionmentioning

confidence: 99%

“…The most demanding scenario for MC uses the 4 × 4 block size data and assumes an unpredictable access pattern. This is why using only a caching mechanism as shown in [3] or [4] is not very efficient because it does not minimize the number of external memory row openings. A caching mechanism is compatible with the proposed data organization and addressing scheme.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Novel data storage for H.264 motion compensation: system architecture and hardware implementation

Matei

Praet

Bauwelinck

et al. 2011

J Image Video Proc.

View full text Add to dashboard Cite

Quarter-pel (q-pel) motion compensation (MC) is one of the features of H.264/AVC that aids in attaining a much better compression factor than what was possible in preceding standards. The better performance however also brings higher requirements for computational complexity and memory access. This article describes a novel data storage and the associated addressing scheme, together with the system architecture and FPGA implementation of H.264 q-pel MC. The proposed architecture is not only suitable for any H.264 standard block size, but also for streams with different image sizes and frame rates. The hardware implementation of a stand alone H.264 q-pel MC on FPGA has shown speeds between 95.9 fps for HD1080p frames, 229 fps for HD 720p and between 2502 and 12623 fps for CIF and QCIF formats.

show abstract

System-on-Chip Solution of Video Stabilization for CMOS Image Sensors in Hand-Held Devices

Kim

Jayanthi²,

Kweon³

2011

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Majority of CMOS image sensors in consumer market utilize a rolling shutter to increase sensitivity. However, it causes severe distortions, such as jitter, wobble, or skew. Since most of these kinds of sensors are used in hand-held devices, the approach of undistorting and generating stabilized images is restricted to resource limited systems. It has also been one of the major challenges to have a mathematical representation of CMOS rolling effect depicting the practical scenario, while keeping accuracy and stability. We propose that a CMOS sensor can be modeled by a section-wise charge-coupled devices model which has multiple homographies and exploit the observation that rolling shutter mechanism gives close relationships among them. We present a CMOS seven-parameter model, and show video stabilization algorithm by the iterative parameter estimation technique. We address four issues while accelerating our stabilization algorithm within resource limited environment: accuracy, stability, computation time, and resource utilization. We developed cache based optimization techniques to meet the requirement of the memory bandwidth and computational time for the iterative parameter estimation and final output image interpolation, and also proposed the incremental form of the seven-parameter model to greatly reduce resource consumption while maintaining the same results as the previous. The validity and effectiveness of our approach is demonstrated by experiments for different types of camera motions. The cache based optimization technique can be used to accelerate other types of iterative vision algorithms that require repetitive memory access: feature tracking, motion estimation, motion compensation, various types of image distortion correction, and also image warping and scaling.Index Terms-Cache-based optimization, CMOS sensor model, hardware interpolation technique, rolling shutter mechanism, video stabilization.

show abstract

Cache Optimization for H.264/AVC Motion Compensation

Cited by 4 publications

References 7 publications

Flexible DMA subsystem in multi-core platforms for video applications

Flexible DMA subsystem in multi-core platforms for video applications

Novel data storage for H.264 motion compensation: system architecture and hardware implementation

System-on-Chip Solution of Video Stabilization for CMOS Image Sensors in Hand-Held Devices

Contact Info

Product

Resources

About