“…Parallelizing DCT with SIMD instructions was one of the targets of [4]. In [5] a combined CPU-GPUs approach for parallel motion estimation is presented, while [6] includes a comparative study between motion estimation parallelism using CUDA cores, MPI and OpenMP. Such works are rather orthogonal to ours, since they tackle parallelism at a level lower than tiles, thus, can be (in principle) incorporated into tile parallelism approaches.…”
Abstract-HEVC has emerged as the new video coding standard promising increased compression ratios compared to its predecessors. This performance improvement comes at a high computational cost. For this reason, HEVC offers three coarse grained parallelization potentials namely, wave front, slices and tiles. In this paper we focus on tile parallelism which is a relatively new concept with its effects not yet fully explored. Particularly, we investigate the problem of partitioning a frame into tiles so that in a resulting one on one tile-CPU core assignment the cores are load balanced, thus, maximum speedup can be achieved. We propose various heuristics for the problem with a focus on low delay coding and evaluate them against state of the art approaches. Results demonstrate that particular heuristic combinations clearly outperform their counterparts in the literature.
“…Parallelizing DCT with SIMD instructions was one of the targets of [4]. In [5] a combined CPU-GPUs approach for parallel motion estimation is presented, while [6] includes a comparative study between motion estimation parallelism using CUDA cores, MPI and OpenMP. Such works are rather orthogonal to ours, since they tackle parallelism at a level lower than tiles, thus, can be (in principle) incorporated into tile parallelism approaches.…”
Abstract-HEVC has emerged as the new video coding standard promising increased compression ratios compared to its predecessors. This performance improvement comes at a high computational cost. For this reason, HEVC offers three coarse grained parallelization potentials namely, wave front, slices and tiles. In this paper we focus on tile parallelism which is a relatively new concept with its effects not yet fully explored. Particularly, we investigate the problem of partitioning a frame into tiles so that in a resulting one on one tile-CPU core assignment the cores are load balanced, thus, maximum speedup can be achieved. We propose various heuristics for the problem with a focus on low delay coding and evaluate them against state of the art approaches. Results demonstrate that particular heuristic combinations clearly outperform their counterparts in the literature.
“…This scheme enabled concurrent deblocking filtering with limited synchronization effort, independently of slice configuration. Several works have focused on the use of GPU to accelerate the ME process for H.264/AVC [9][10][11][12][13][14]. Most GPUbased ME algorithms employ the full-search method because it is suitable for the SIMD (single instruction and multiple data) architecture of GPU.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, the frame-level parallel encoding technique on multiple CPUs is used to improve the overall throughput. Monteiro et al developed an ME algorithm in three kinds of platforms: multi-core general purposed processor, cluster/grid machines using message passing interface and GPU [14]. Although the GPU-based ME achieves significant speed-ups against the other platforms, only integer-pel ME and no SCP are considered.…”
The recent video compression standard, HEVC (high efficiency video coding), will most likely be used in various applications in the near future. However, the encoding process is far too slow for real-time applications. At the same time, computing capabilities of GPUs (graphics processing units) have become more powerful in these days. In this paper, we have proposed a GPU-based parallel motion estimation (ME) algorithm to enhance the performance of an HEVC encoder. A frame is partitioned into two subframes for pipelined execution to improve GPU utilization. The flow chart is redetermined to solve data hazards in the pipelined execution. Two new methods are introduced in the proposed ME: decision of a representative search center position (RSCP) and warp-based concurrent parallel reduction (WCPR). A RSCP employs motion vectors of a co-located CTU in a previously encoded frame to solve a dependency problem in parallel computation with negligible coding loss. WCPR concurrently executes several parallel reduction operations, which increases the thread utilization from 20 to 89 % without any thread synchronization. The proposed encoder can make the portion of ME in the encoder negligible with 2.2 % bitrate increase against the HEVC test model (HM) encoder. In terms of ME, the proposed ME is 130.7 times faster than that of the HM encoder.
“…As a result, the proposed algorithm provides tremendous speedup if implemented on modern high performance computing (HPC) platforms ranging from multicore/many-core machine architectures to graphics processing units to supercomputers. In the literature, there have been several attempts to parallelize motion estimation [37][38][39][40][41][42]. Several works have proposed applying GPUs for motion estimation.…”
Section: Introductionmentioning
confidence: 99%
“…In [39], implementations of the ES algorithm, the diamond search (DS) algorithm, and the four-step search (4SS) algorithms in CUDA have been proposed. A parallel implementation of the ES algorithm on the GPU using CUDA is also proposed in [40,41] along with a parallel solution for multi-core processors using the Open Message Passing (OpenMP) library and a distributed solution for cluster/grid machines using the Message Passing Interface (MPI) library [41]. GPU-based hierarchical motion estimation in CUDA has been proposed in [42].…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.