Applying CUDA Architecture to Accelerate Full Search Block Matching Algorithm for High Performance Motion Estimation in Video Encoding

Monteiro, Eduarda; Vizzotto, Bruno Boessio; Diniz, Cláudio M.; Bampi, Sérgio

doi:10.1109/sbac-pad.2011.19

Cited by 7 publications

(4 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, most of the parallel ME research work is on many-core concentrates on the full-search method, which is inherently highly parallel. In Chen and Hang [15]; Cheng et al [16]; Lee and Oh [17]; Monteiro et al [18], the parallel fullsearch method is implemented on the GPU platform, and about 10-100x speed-ups are, respectively, obtained compared with the serial full-search method on single core of a CPU. Although the speed-up of the full-search method is high on GPU platform, its performance advantage is not obvious compared with serial fast search method in HEVC or H.264/AVC on single core of a CPU.…”

Section: Related Workmentioning

confidence: 99%

A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC

Xue

Ren

et al. 2017

Scientific Programming

View full text Add to dashboard Cite

We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30-60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560 × 1600 video format, while, for 832 × 480, the performance is more than 100 fps.

show abstract

Section: Related Workmentioning

confidence: 99%

A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC

Xue

Ren

et al. 2017

Scientific Programming

View full text Add to dashboard Cite

show abstract

“…Initial efforts for the parallelization of the block matching algorithms in old parallel processing platforms have been presented in [1,11,12,14,15,16]. Specifically, as far as the HS algorithm is concerned, very few research works have been published, referring to systolic arrays [5,11,12] and not modern parallelization frameworks.…”

Section: Related Workmentioning

confidence: 99%

“…They have used either OpenMP or OpenMPI or GPU [16], or custom architectures [15] or special FPGA architectures [14]. None of them have ever tried to parallelize a block matching algorithm on an existing high performance embedded system, like a smart mobile phone.…”

Section: Introductionmentioning

confidence: 99%

Parallelization of the hierarchical search in Python for high performance embedded systems

Radoglou-Grammatikis

Evdoxia

Dasygenis

2016

2016 5th International Conference on Modern Circuits and Systems Technologies (MOCAST)

View full text Add to dashboard Cite

The number of high performance embedded systems that are used for multimedia applications, like video encoding or decoding, has erupted. A key component in video encoding is the motion estimation, which exhibits high computational complexity and hard to meet deadlines. The most popular technique for motion estimation is block matching. The hierarchical search (HS) is a popular and a very fast block matching algorithm that achieves the best image quality, with a very high computational complexity. This complexity is usually handled using parallelization. Our work differentiates from other authors, because it targets parallelization on embedded systems using the Python framework and specifically the Multiprocessing module. The experimental results on parallelization of the HS algorithm on a high performance multi core embedded systems, illustrate the usefulness of our methodology, with speedup up to 1.4.

show abstract

“…There are many methods based on various approaches including gray based [3], frequency based [4] or feature based [5] methods. Also, motion estimation can be computed in 2D [6] [7], which is suitable for long monitoring distances in outdoor conditions or 3D [8], which are suitable for low focal distances, where obvious changes in parallaxes inducted by 3D viewpoint translations occurs.…”

Section: Introductionmentioning

confidence: 99%