Abstract:Despite CNN-based deblur models have shown their superiority when solving motion blurs, restoring a photorealistic image from severe motion blur remains an ill-posed problem due to the loss of temporal information and textures. Event cameras such as Dynamic and Active-pixel Vision Sensor (DAVIS) [3] can simultaneously produce gray-scale Active Pixel Sensor (APS) frames and events, which can capture fast motions as events of very high temporal resolution, i. e., 1µs, can provide extra information for blurry APS… Show more
“…Hybrid pipelines: Hybrid of complementary event and image sensors can remove some of their shortcomings while enabling the benefits. There also exists compact hybrid sensor solutions [ 1 ] sharing the same lens, which can be employed, for instance, for motion de-blurring as shown recently [ 33 ]. Therefore, alternatively, for energy efficient higher precision and detailed alignment, the coarse face pose estimation can activate a more precise frame-based image acquisition and processing, after a first detection, implementing an approach similar to progressive initialization [ 43 ].…”
Section: Discussionmentioning
confidence: 99%
“…Tree ensembles have also been suggested in a very recent study [ 32 ] instead of DNNs for the similar efficiency reasons, though for a different vision problem. Although DNNs have also been applied with ECs for different vision problems [ 33 , 34 , 35 ], those studies aim at benefits of ECs other than the energy efficiency, thus are computationally very demanding. The comparison of our method to human performance in facial landmark placement on the same dataset shows that extremely randomized trees (ERT) cascade applied directly on pixel-events—i.e., without facial image reconstruction and training on image datasets—has good accuracy.…”
Event camera (EC) emerges as a bio-inspired sensor which can be an alternative or complementary vision modality with the benefits of energy efficiency, high dynamic range, and high temporal resolution coupled with activity dependent sparse sensing. In this study we investigate with ECs the problem of face pose alignment, which is an essential pre-processing stage for facial processing pipelines. EC-based alignment can unlock all these benefits in facial applications, especially where motion and dynamics carry the most relevant information due to the temporal change event sensing. We specifically aim at efficient processing by developing a coarse alignment method to handle large pose variations in facial applications. For this purpose, we have prepared by multiple human annotations a dataset of extreme head rotations with varying motion intensity. We propose a motion detection based alignment approach in order to generate activity dependent pose-events that prevents unnecessary computations in the absence of pose change. The alignment is realized by cascaded regression of extremely randomized trees. Since EC sensors perform temporal differentiation, we characterize the performance of the alignment in terms of different levels of head movement speeds and face localization uncertainty ranges as well as face resolution and predictor complexity. Our method obtained 2.7% alignment failure on average, whereas annotator disagreement was 1%. The promising coarse alignment performance on EC sensor data together with a comprehensive analysis demonstrate the potential of ECs in facial applications.
“…Hybrid pipelines: Hybrid of complementary event and image sensors can remove some of their shortcomings while enabling the benefits. There also exists compact hybrid sensor solutions [ 1 ] sharing the same lens, which can be employed, for instance, for motion de-blurring as shown recently [ 33 ]. Therefore, alternatively, for energy efficient higher precision and detailed alignment, the coarse face pose estimation can activate a more precise frame-based image acquisition and processing, after a first detection, implementing an approach similar to progressive initialization [ 43 ].…”
Section: Discussionmentioning
confidence: 99%
“…Tree ensembles have also been suggested in a very recent study [ 32 ] instead of DNNs for the similar efficiency reasons, though for a different vision problem. Although DNNs have also been applied with ECs for different vision problems [ 33 , 34 , 35 ], those studies aim at benefits of ECs other than the energy efficiency, thus are computationally very demanding. The comparison of our method to human performance in facial landmark placement on the same dataset shows that extremely randomized trees (ERT) cascade applied directly on pixel-events—i.e., without facial image reconstruction and training on image datasets—has good accuracy.…”
Event camera (EC) emerges as a bio-inspired sensor which can be an alternative or complementary vision modality with the benefits of energy efficiency, high dynamic range, and high temporal resolution coupled with activity dependent sparse sensing. In this study we investigate with ECs the problem of face pose alignment, which is an essential pre-processing stage for facial processing pipelines. EC-based alignment can unlock all these benefits in facial applications, especially where motion and dynamics carry the most relevant information due to the temporal change event sensing. We specifically aim at efficient processing by developing a coarse alignment method to handle large pose variations in facial applications. For this purpose, we have prepared by multiple human annotations a dataset of extreme head rotations with varying motion intensity. We propose a motion detection based alignment approach in order to generate activity dependent pose-events that prevents unnecessary computations in the absence of pose change. The alignment is realized by cascaded regression of extremely randomized trees. Since EC sensors perform temporal differentiation, we characterize the performance of the alignment in terms of different levels of head movement speeds and face localization uncertainty ranges as well as face resolution and predictor complexity. Our method obtained 2.7% alignment failure on average, whereas annotator disagreement was 1%. The promising coarse alignment performance on EC sensor data together with a comprehensive analysis demonstrate the potential of ECs in facial applications.
“…Several studies such as E2VID [20], EventSR [21] and HDN [22] reconstruct events back to normal images, and then the generated images can be processed similar to conventional vision tasks. The reconstruction methods usually utilize recursive modules such as ConvLSTM [23] with encoder-decoder structures such as UNet [24] to build images based on the events [20], [25].…”
The neuromorphic event cameras, which capture the optical changes of a scene, have drawn increasing attention due to their high speed and low power consumption. However, the event data are noisy, sparse, and nonuniform in the spatial-temporal domain with extremely high temporal resolution, making it challenging to process for traditional deep learning algorithms. To enable convolutional neural network models for event vision tasks, most methods encode events into point-cloud or voxel representations, but their performance still has much room for improvement. Additionally, as event cameras can only detect changes in the scene, relative movements can lead to misalignment, i.e., the same pixel may refer to different real-world points at different times. To this end, this work proposes the aligned compressed event tensor (ACE) as a novel event data representation, and a framework called branched event net (BET) for event-based vision under both static and dynamic scenes. We apply them on various datasets for object classification and action recognition tasks, and show that they surpass state-of-the-art methods by significant margins. Specifically, our method achieves 98.88% accuracy for the DVS128 action recognition task, and outperforms the second best method by large margins of 4.85%, 9.56% and 2.33% on N-Caltech101, DVSAction and NeuroIV datasets, respectively. Furthermore, the proposed ACE-BET is efficient, and achieves the fastest inference speed among various methods being tested.
“…Pan et al [15] proposed an event-based double integral model for obtaining a high-framerate video from events and a blurry intensity image and Lin et al [14] implemented the physical model proposed by [15] as a neural network, which achieved a high performance in terms of video deblurring and interpolation. Moreover, Wang et al [28] unified denoising, deblurring, and super-resolution in one model by an event-enhanced degeneration model and Zhang et al [29] proposed a hybrid deblur net for image deblurring with learned event representation. These methods have shown their advantage of image enhancement.…”
Due to the limitation of event sensors, the spatial resolution of event data is relatively low compared to the spatial resolution of the conventional frame-based camera. However, low-spatial-resolution events recorded by event cameras are rich in temporal information which is helpful for image deblurring, while intensity images captured by frame cameras are in high resolution and have potential to promote the quality of events. Considering the complementarity between events and intensity images, an alternately performed model is proposed in this paper to deblur high-resolution images with the help of low-resolution events. This model is composed of two components: a DeblurNet and an EventSRNet. It first uses the DeblurNet to attain a preliminary sharp image aided by low-resolution events. Then, it enhances the quality of events with EventSRNet by extracting the structure information in the generated sharp image. Finally, the enhanced events are sent back into DeblurNet to attain a higher quality intensity image. Extensive evaluations on the synthetic GoPro dataset and real RGB-DAVIS dataset have shown the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.