PIX2NVS: Parameterized conversion of pixel-domain video frames to neuromorphic vision streams

Bi, Yin; Andreopoulos, Yiannis

doi:10.1109/icip.2017.8296630

Cited by 33 publications

(27 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Depending on the pixel's intensity difference a positive or negative event is generated. Pix2NVS [3] computes per-pixel luminance from conventional video frames. The technique generates synthetic events with inaccurate timestamps clustered to frame timestamps.…”

Section: Synthetic Eventsmentioning

confidence: 99%

Video to Events: Recycling Video Datasets for Event Cameras

Gehrig

Hidalgo-Carrió

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

155

119

View full text Add to dashboard Cite

Event cameras are novel sensors that output brightness changes in the form of a stream of asynchronous "events" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high dynamic range (HDR), high temporal resolution, and no motion blur. Recently, novel learning approaches operating on event data have achieved impressive results. Yet, these methods require a large amount of event data for training, which is hardly available due the novelty of event sensors in computer vision research. In this paper, we present a method that addresses these needs by converting any existing video dataset recorded with conventional cameras to synthetic event data. This unlocks the use of a virtually unlimited number of existing video datasets for training networks designed for real event data. We evaluate our method on two relevant vision tasks, i.e., object recognition and semantic segmentation, and show that models trained on synthetic events have several benefits: (i) they generalize well to real event data, even in scenarios where standard-camera images are blurry or overexposed, by inheriting the outstanding properties of event cameras; (ii) they can be used for finetuning on real data to improve over state-of-the-art for both classification and semantic segmentation.

show abstract

Section: Synthetic Eventsmentioning

confidence: 99%

Video to Events: Recycling Video Datasets for Event Cameras

Gehrig

Hidalgo-Carrió

et al. 2020

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

155

119

View full text Add to dashboard Cite

show abstract

“…Similarly, MNIST-DVS and CIFAR10-DVS datasets were created by displaying a moving image on a monitor and recording with a fixed DAVIS sensor [50]. Emulator software has also been proposed in order to generate neuromorphic events from pixel-domain video formats using the change of pixel intensities of successively rendered images [26], [51]. While useful for early-stage evaluation, these datasets cannot capture the real dynamics of an NVS device due to the limited frame rate of the utilized content, as well as the limitations and artificial noise imposed by the recording or emulation environment.…”

Section: A Object Classificationmentioning

confidence: 99%

Graph-Based Spatio-Temporal Feature Learning for Neuromorphic Vision Sensing

Bi¹,

Chadha²,

Abbas³

et al. 2020

IEEE Trans. on Image Process.

Self Cite

View full text Add to dashboard Cite

Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., "spikes") in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearancebased and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS). Index Terms-Neuromorphic vision sensing, spatio-temporal feature learning, graph convolutional neural networks, object classification, human action recognition I. INTRODUCTION With the prevalence and advances of CMOS active pixel sensing (APS) and deep learning, researchers have achieved good performance in APS-based computer vision tasks, such as object detection [1], [2], object recognition [3], [4] and action recognition [5], [6]. However, APS cameras suffer from limited frame rate, high redundancy between frames, blurriness YB, AC, AA and YA are with the Electronic and Electrical Engineer

show abstract

“…Similarly, MNIST-DVS and CIFAR10-DVS datasets were created by displaying a moving image on a monitor and recording with a fixed DAVIS sensor [37]. Emulator software has also been proposed in order to generate neuromorphic events from pixel-domain video formats using the change of pixel intensities of successively rendered images [42,5]. While useful for early-stage evaluation, these datasets cannot capture the real dynamics of an NVS device due to the limited frame rate of the utilized content, as well as the limitations and artificial noise imposed by the recording or emulation environment.…”

Section: Object Classificationmentioning

confidence: 99%

Graph-Based Object Classification for Neuromorphic Vision Sensing

Chadha

Abbas

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

119

View full text Add to dashboard Cite

Neuromorphic vision sensing (NVS) devices represent visual information as sequences of asynchronous discrete events (a.k.a., "spikes") in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind its APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize its sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearancebased and motion-based tasks. The core of our framework comprises a spatial feature learning module, which utilizes residual-graph convolutional neural networks (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show how our framework can be configured for object classification, action recognition and action similarity labeling. Importantly, our approach preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large realworld NVS datasets for complex recognition tasks, we introduce, evaluate and make available the American Sign Language letters (ASL-DVS), as well as human action dataset (UCF101-DVS, HMDB51-DVS and ASLAN-DVS). Figure 1: Examples of archery action captured by APS and NVS sensors. APS sensors capture images at fixed frame rates, while NVS sensors output a stream of events.

show abstract

PIX2NVS: Parameterized conversion of pixel-domain video frames to neuromorphic vision streams

Cited by 33 publications

References 8 publications

Video to Events: Recycling Video Datasets for Event Cameras

Video to Events: Recycling Video Datasets for Event Cameras

Graph-Based Spatio-Temporal Feature Learning for Neuromorphic Vision Sensing

Graph-Based Object Classification for Neuromorphic Vision Sensing

Contact Info

Product

Resources

About