Modern video encoders like the AOM Video 1 (AV1) implement several complex tools to allow the required high level of compression efficiency. The Fractional Motion Estimation (FME) is one of these tools and in AV1 the FME defines 90 different filters. To handle such complexity, hardware acceleration using approximate computing has become an alternative to be explored. This paper presents an approximate solution for the AV1 FME interpolation filters based on the approximation of the original filter coefficients intending to generate more hardware friendly coefficients. The approximated version was designed in hardware and can achieve real-time interpolation for UHD 8K videos at 30 frames per second, when synthesized using 40nm TSMC standard-cells technology. The designed architecture dissipates 26.79mW which represents more than 80% power reduction when compared to the original precise solution. The approximation implied in a small average coding efficiency degradation of 0.54% in BD-BR. When comparing with related works, this architecture reaches an expressive power reduction (2.1 to 4.8 times) even supporting more complex tools.
Modern video encoders like the AOMedia Video 1 (AV1) implement several complex tools to allow the required high level of compression efficiency. The Fractional Motion Estimation (FME) is one of these complex tools, and AV1 FME defines 42 different interpolation filters. To handle such complexity, hardware acceleration using approximate computing has become an interesting alternative to be explored. This paper presents three optimized approximate architectures for the AV1 FME interpolation filters. The architectures reach real time interpolation for UHD 4K videos at 30 frames per second in a low cost, low power, and memory-efficient design. The architectures were synthesized for a 40nm TSMC standard-cells technology reaching power gains up to 83%, when compared to a precise architecture, and up to 20% when compared to a previously published approximated solution. The area gains were also expressive: up to 83% and 40%, respectively. The architectures also allow a memory bandwidth reduction of up to 59.5%, in comparison with the state-of-the-art solutions. The approximations implied small coding efficiency degradation of 0.54% and 1.25% in BD-BR. The presented architectures have the best results found in the literature when considering the trade-off among hardware cost, power dissipation, processing rate, memory bandwidth, and coding efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.