The use of IEEE 754-2008 half-precision floatingpoint numbers is an emerging trend in Graphical Processing Units' architecture. Being such a compact way of representing data, its use may speed up programs by reducing the memory bandwidth usage and allowing hardware designers to fit more computing units within the same die space. In this paper, we highlight the acceleration offered by the use of half floatingpoint numbers over different implementations of the same operation, a 2D convolution. We show that even though it may lead up to a significant speed-up, the degradation brought by this new format is not always negligible. Then, we choose a deconvolution problem inspired by the SKA radio-telescope processing pipeline to show how half floats behave in a more complex application.
This article tackles the entire lifecycle of an algorithm: from its design to its implementation. It exhibits a method for making efficient choices at algorithm design time knowing the characteristics of the underlying hardware target. As of today, computing the optical flow of a stream of images is still a demanding task. In the meantime, the use of Graphics Processing Units (GPU) has become mainstream and allows substantial gains in processing frame rate. In this paper, we focus on a specific variational method (CLG [1]) where linear systems have to be solved. They depend on two parameters α and ρ. To efficiently solve the problem, we look at convergence speed with respect to the model's parameters. We benchmark usual linear solvers with preconditioners to identify the fastest in terms of convergence per iteration. We then show that once implemented on GPUs, the most efficient solver changes depending on the model parameters. For 640 × 480 images, with the right choice of solver and parameters, our implementation can solve the system with relative 10e −7 accuracy in 0.25 ms on a Titan V GPU. All the results are aggregated on a 30-image set to increase confidence in their extendability.
Determining the optical flow of a video is a compute-intensive task essential for computer vision. For achieving this processing in real-time, the whole algorithm deployment chain must be thought of for efficiency first. The development is usually divided into two parts: first, designing an algorithm that meets precision constraints, then, implementing and optimizing its execution on the targeted platform. We argue that unifying those operations enhances performance on the embedded processor.This paper is based on an industrial use case of computer vision. The objective is to determine dense optical flow in real-time on an embedded GPU platform: the Nvidia AGX Xavier. The CLG (Combined Local-Global) optical flow method, initially chosen, is analyzed to understand the convergence speed of its underlying optimization problem. The Jacobi solver is selected for implementation because of its parallel nature. The whole multi-level processing is then ported to the GPU, using several specific optimization strategies. In particular, we analyze the impact of fusing the solver's iterations with the roofline model.As a result, with a 30W power budget, our implementation runs at 60FPS, on 640 × 512 images, with a four-level processing. Hopefully, this example should provide feedback on the issues that arise when trying to port a method to a parallel platform and serve for further implementations of computer vision algorithms on specialized hardware.
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.