“…As explained in Section IV, the related works [16], [21], [22], [17], [23], [18], and [24] cannot be fairly compared with our architectures, since the number of filters supported, the complexity of these filters and the used technology do not match with the defined in our work.…”
Section: Synthesis Results and Related Workmentioning
confidence: 98%
“…The work in [23] proposes an architecture for the VP9-10 FME and MC, processing videos up to UHD 8K@30fps with precise filters. In the same way as the VP9 work presented in [17], the filters differ from the AV1 specifications, making impossible a fair comparison.…”
Section: Related Workmentioning
confidence: 99%
“…A total of 22 block sizes are supported by the AV1 FME, ranging from 128×128 down to 4×4, with different rectangular partitions, as explained in Section II. To deal with it, each block can be decomposed into smaller sub-blocks, allowing for regular processing and memory access [23]. The best sub-block size should take into account parameters such as memory bandwidth, samples to process, and parallelism level.…”
Section: A Definition Of the Basic Partition Sizementioning
confidence: 99%
“…Other works in the literature also proposed hardware architectures for interpolation tools, however without the use of approximate computing, like [21], [22], [23], [24]. Some of these works target other video standards rather than AV1, and some are targeting other video coding tools that also require interpolation filters, like the MC.…”
Section: Introductionmentioning
confidence: 99%
“…Some of these works target other video standards rather than AV1, and some are targeting other video coding tools that also require interpolation filters, like the MC. The works [21] and [22] present architectures for the Sharp and Regular filters of AV1 MC and FME, the work [23] presents an architecture for the VP9-10 MC and FME, and the work [24] proposed an architecture for the VVC MC.…”
Modern video encoders like the AOMedia Video 1 (AV1) implement several complex tools to allow the required high level of compression efficiency. The Fractional Motion Estimation (FME) is one of these complex tools, and AV1 FME defines 42 different interpolation filters. To handle such complexity, hardware acceleration using approximate computing has become an interesting alternative to be explored. This paper presents three optimized approximate architectures for the AV1 FME interpolation filters. The architectures reach real time interpolation for UHD 4K videos at 30 frames per second in a low cost, low power, and memory-efficient design. The architectures were synthesized for a 40nm TSMC standard-cells technology reaching power gains up to 83%, when compared to a precise architecture, and up to 20% when compared to a previously published approximated solution. The area gains were also expressive: up to 83% and 40%, respectively. The architectures also allow a memory bandwidth reduction of up to 59.5%, in comparison with the state-of-the-art solutions. The approximations implied small coding efficiency degradation of 0.54% and 1.25% in BD-BR. The presented architectures have the best results found in the literature when considering the trade-off among hardware cost, power dissipation, processing rate, memory bandwidth, and coding efficiency.
“…As explained in Section IV, the related works [16], [21], [22], [17], [23], [18], and [24] cannot be fairly compared with our architectures, since the number of filters supported, the complexity of these filters and the used technology do not match with the defined in our work.…”
Section: Synthesis Results and Related Workmentioning
confidence: 98%
“…The work in [23] proposes an architecture for the VP9-10 FME and MC, processing videos up to UHD 8K@30fps with precise filters. In the same way as the VP9 work presented in [17], the filters differ from the AV1 specifications, making impossible a fair comparison.…”
Section: Related Workmentioning
confidence: 99%
“…A total of 22 block sizes are supported by the AV1 FME, ranging from 128×128 down to 4×4, with different rectangular partitions, as explained in Section II. To deal with it, each block can be decomposed into smaller sub-blocks, allowing for regular processing and memory access [23]. The best sub-block size should take into account parameters such as memory bandwidth, samples to process, and parallelism level.…”
Section: A Definition Of the Basic Partition Sizementioning
confidence: 99%
“…Other works in the literature also proposed hardware architectures for interpolation tools, however without the use of approximate computing, like [21], [22], [23], [24]. Some of these works target other video standards rather than AV1, and some are targeting other video coding tools that also require interpolation filters, like the MC.…”
Section: Introductionmentioning
confidence: 99%
“…Some of these works target other video standards rather than AV1, and some are targeting other video coding tools that also require interpolation filters, like the MC. The works [21] and [22] present architectures for the Sharp and Regular filters of AV1 MC and FME, the work [23] presents an architecture for the VP9-10 MC and FME, and the work [24] proposed an architecture for the VVC MC.…”
Modern video encoders like the AOMedia Video 1 (AV1) implement several complex tools to allow the required high level of compression efficiency. The Fractional Motion Estimation (FME) is one of these complex tools, and AV1 FME defines 42 different interpolation filters. To handle such complexity, hardware acceleration using approximate computing has become an interesting alternative to be explored. This paper presents three optimized approximate architectures for the AV1 FME interpolation filters. The architectures reach real time interpolation for UHD 4K videos at 30 frames per second in a low cost, low power, and memory-efficient design. The architectures were synthesized for a 40nm TSMC standard-cells technology reaching power gains up to 83%, when compared to a precise architecture, and up to 20% when compared to a previously published approximated solution. The area gains were also expressive: up to 83% and 40%, respectively. The architectures also allow a memory bandwidth reduction of up to 59.5%, in comparison with the state-of-the-art solutions. The approximations implied small coding efficiency degradation of 0.54% and 1.25% in BD-BR. The presented architectures have the best results found in the literature when considering the trade-off among hardware cost, power dissipation, processing rate, memory bandwidth, and coding efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.