Context. The Richardson-Lucy method is the most popular deconvolution method in astronomy because it preserves the number of counts and the non-negativity of the original object. Regularization is, in general, obtained by an early stopping of Richardson-Lucy iterations. In the case of point-wise objects such as binaries or open star clusters, iterations can be pushed to convergence. However, it is well-known that Richardson-Lucy is an inefficient method. In most cases and, in particular, for low noise levels, acceptable solutions are obtained at the cost of hundreds or thousands of iterations, thus several approaches to accelerating Richardson-Lucy have been proposed. They are mainly based on Richardson-Lucy being a scaled gradient method for the minimization of the Kullback-Leibler divergence, or Csiszár I-divergence, which represents the data-fidelity function in the case of Poisson noise. In this framework, a line search along the descent direction is considered for reducing the number of iterations. Aims. A general optimization method, referred to as the scaled gradient projection method, has been proposed for the constrained minimization of continuously differentiable convex functions. It is applicable to the non-negative minimization of the Kullback-Leibler divergence. If the scaling suggested by Richardson-Lucy is used in this method, then it provides a considerable increase in the efficiency of Richardson-Lucy. Therefore the aim of this paper is to apply the scaled gradient projection method to a number of imaging problems in astronomy such as single image deconvolution, multiple image deconvolution, and boundary effect correction. Methods. Deconvolution methods are proposed by applying the scaled gradient projection method to the minimization of the Kullback-Leibler divergence for the imaging problems mentioned above and the corresponding algorithms are derived and implemented in interactive data language. For all the algorithms, several stopping rules are introduced, including one based on a recently proposed discrepancy principle for Poisson data. To attempt to achieve a further increase in efficiency, we also consider an implementation on graphic processing units. Results. The proposed algorithms are tested on simulated images. The acceleration of scaled gradient projection methods achieved with respect to the corresponding Richardson-Lucy methods strongly depends on both the problem and the specific object to be reconstructed, and in our simulations the improvement achieved ranges from about a factor of 4 to more than 30. Moreover, significant accelerations of up to two orders of magnitude have been observed between the serial and parallel implementations of the algorithms. The codes are available upon request.
Although deconvolution can improve the quality of any type of microscope, the high computational time required has so far limited its massive spreading. Here we demonstrate the ability of the scaled-gradient-projection (SGP) method to provide accelerated versions of the most used algorithms in microscopy. To achieve further increases in efficiency, we also consider implementations on graphic processing units (GPUs). We test the proposed algorithms both on synthetic and real data of confocal and STED microscopy. Combining the SGP method with the GPU implementation we achieve a speed-up factor from about a factor 25 to 690 (with respect the conventional algorithm). The excellent results obtained on STED microscopy images demonstrate the synergy between super-resolution techniques and image-deconvolution. Further, the real-time processing allows conserving one of the most important property of STED microscopy, i.e the ability to provide fast sub-diffraction resolution recordings.
The terms and conditions for the reuse of this version of the manuscript are specified in the publishing policy. For all terms of use and more information see the publisher's website.
Modern automotive-grade embedded computing platforms feature high-performance Graphics Processing Units (GPUs) to support the massively parallel processing power needed for next-generation autonomous driving applications (e.g., Deep Neural Network (DNN) inference, sensor fusion, path planning, etc). As these workload-intensive activities are pushed to higher criticality levels, there is a stronger need for more predictable scheduling algorithms that are able to guarantee predictability without overly sacrificing GPU utilization. Unfortunately, the real-rime literature on GPU scheduling mostly considered limited (or null) preemption capabilities, while previous efforts in broader domains were often based on programming models and APIs that were not designed to support the real-rime requirements of recurring workloads. In this paper, we present the design of a prototype real-time scheduler for GPU activities on an embedded System on a Chip (SoC) featuring a cuttingedge GPU architecture by NVIDIA adopted in the autonomous driving domain. The scheduler runs as a software partition on top of the NVIDIA hypervisor, and it leverages latest generation architectural features, such as pixel-level preemption and threadlevel preemption. Such a design allowed us to implement and test a preemptive Earliest Deadline First (EDF) scheduler for GPU tasks providing bandwidth isolations by means of a Constant Bandwidth Server (CBS). Our work involved investigating alternative programming models for compute APIs, allowing us to characterize CPU-to-GPU command submission with more detailed scheduling information. A detailed experimental characterization is presented to show the significant schedulability improvement of recurring real-time GPU tasks.
Object detection is arguably one of the most important and complex tasks to enable the advent of next-generation autonomous systems. Recent advancements in deep learning techniques allowed a significant improvement in detection accuracy and latency of modern neural networks, allowing their adoption in automotive, avionics and industrial embedded systems, where performances are required to meet size, weight and power constraints. Multiple benchmarks and surveys exist to compare stateof-the-art detection networks, profiling important metrics, like precision, latency and power efficiency on Commercial-off-the-Shelf (COTS) embedded platforms. However, we observed a fundamental lack of fairness in the existing comparisons, with a number of implicit assumptions that may significantly bias the metrics of interest. This includes using heterogeneous settings for the input size, training dataset, threshold confidences, and, most importantly, platform-specific optimizations, that are especially important when assessing latency and energy-related values. The lack of uniform comparisons is mainly due to the significant effort required to re-implement network models, whenever openly available, on the specific platforms, to properly configure the available acceleration engines for optimizing performance, and to retrain the model using a homogeneous dataset. This paper aims at filling this gap, providing a comprehensive and fair comparison of the best-in-class Convolution Neural Networks (CNNs) for real-time embedded systems, detailing the effort made to achieve an unbiased characterization on cutting-edge system-on-chips. Multi-dimensional trade-offs are explored for achieving a proper configuration of the available programmable accelerators for neural inference, adopting the best available software libraries. To stimulate the adoption of fair benchmarking assessments, the framework is released to the public in an open source repository.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.