“…Therefore, most of the parallel ME research work is on many-core concentrates on the full-search method, which is inherently highly parallel. In Chen and Hang [15]; Cheng et al [16]; Lee and Oh [17]; Monteiro et al [18], the parallel fullsearch method is implemented on the GPU platform, and about 10-100x speed-ups are, respectively, obtained compared with the serial full-search method on single core of a CPU. Although the speed-up of the full-search method is high on GPU platform, its performance advantage is not obvious compared with serial fast search method in HEVC or H.264/AVC on single core of a CPU.…”
We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30-60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560 × 1600 video format, while, for 832 × 480, the performance is more than 100 fps.
“…Therefore, most of the parallel ME research work is on many-core concentrates on the full-search method, which is inherently highly parallel. In Chen and Hang [15]; Cheng et al [16]; Lee and Oh [17]; Monteiro et al [18], the parallel fullsearch method is implemented on the GPU platform, and about 10-100x speed-ups are, respectively, obtained compared with the serial full-search method on single core of a CPU. Although the speed-up of the full-search method is high on GPU platform, its performance advantage is not obvious compared with serial fast search method in HEVC or H.264/AVC on single core of a CPU.…”
We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short), by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30-60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560 × 1600 video format, while, for 832 × 480, the performance is more than 100 fps.
“…Initial efforts for the parallelization of the block matching algorithms in old parallel processing platforms have been presented in [1,11,12,14,15,16]. Specifically, as far as the HS algorithm is concerned, very few research works have been published, referring to systolic arrays [5,11,12] and not modern parallelization frameworks.…”
Section: Related Workmentioning
confidence: 99%
“…They have used either OpenMP or OpenMPI or GPU [16], or custom architectures [15] or special FPGA architectures [14]. None of them have ever tried to parallelize a block matching algorithm on an existing high performance embedded system, like a smart mobile phone.…”
The number of high performance embedded systems that are used for multimedia applications, like video encoding or decoding, has erupted. A key component in video encoding is the motion estimation, which exhibits high computational complexity and hard to meet deadlines. The most popular technique for motion estimation is block matching. The hierarchical search (HS) is a popular and a very fast block matching algorithm that achieves the best image quality, with a very high computational complexity. This complexity is usually handled using parallelization. Our work differentiates from other authors, because it targets parallelization on embedded systems using the Python framework and specifically the Multiprocessing module. The experimental results on parallelization of the HS algorithm on a high performance multi core embedded systems, illustrate the usefulness of our methodology, with speedup up to 1.4.
“…There are many methods based on various approaches including gray based [3], frequency based [4] or feature based [5] methods. Also, motion estimation can be computed in 2D [6] [7], which is suitable for long monitoring distances in outdoor conditions or 3D [8], which are suitable for low focal distances, where obvious changes in parallaxes inducted by 3D viewpoint translations occurs.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.