End-to-End Learning of Representations for Asynchronous Event-Based Data

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of µs), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world.

show abstract

“…Reconstructed intensity image by [8]. Grid-like representations are compatible with conventional computer vision methods [83].…”

Section: Event Processingmentioning

confidence: 99%

Event-Based Vision: A Survey

Gallego

Delbrück

Orchard

et al. 2022

IEEE Trans. Pattern Anal. Mach. Intell.

Self Cite

1,103

720

View full text Add to dashboard Cite

show abstract

“…Instead of simply averaging event rates to obtain input frames, our approach generalizes to using more advanced features for event-based vision, such as time surfaces (Sironi et al, 2018), event spike tensors (Gehrig et al, 2019) or motion-based features (Clady et al, 2017). As use-cases for event-based vision are becoming increasingly challenging (Gallego et al, 2019), and neuromorphic hardware platforms become more mature (DeBole et al, 2019), our approach fills an important gap to provide powerful SNNs ready for deployment on those platforms.…”

Section: Discussionmentioning

confidence: 99%

“…# params # ops [MOps] HATS/linear SVM (Sironi et al, 2018) 90.2 --Rec. U-Net+CNN (Rebecq et al, 2019) 91.0 > 10 6 -ResNet-34 (Gehrig et al, 2019) 92. outputs, spikes are present only in the short paths from input to output of the networks. Consequently, the overall spiking activity is low, slowing down the convergence of the firing rate approximations.…”

Section: N-carsmentioning

confidence: 99%

“…As the number of parameters is not directly listed in the work we compare to, we estimate them from their experiment description: Rebecq et al (2019) use a U-Net + ResNet18, which typically has 10 6 to 10 7 parameters. Gehrig et al (2019) use a ResNet-34, which has ∼10 7 parameters. In Lee et al (2016), three layers with (2312, 800, 10) neurons are used, resulting in 1,857,600 parameters.…”

Section: N-mnistmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Processing of Spatio-Temporal Data Streams With Spiking Neural Networks

et al. 2020

View full text Add to dashboard Cite

Spiking neural networks (SNNs) are potentially highly efficient models for inference on fully parallel neuromorphic hardware, but existing training methods that convert conventional artificial neural networks (ANNs) into SNNs are unable to exploit these advantages. Although ANN-to-SNN conversion has achieved state-of-the-art accuracy for static image classification tasks, the following subtle but important difference in the way SNNs and ANNs integrate information over time makes the direct application of conversion techniques for sequence processing tasks challenging. Whereas all connections in SNNs have a certain propagation delay larger than zero, ANNs assign different roles to feedforward connections, which immediately update all neurons within the same time step, and recurrent connections, which have to be rolled out in time and are typically assigned a delay of one time step. Here, we present a novel method to obtain highly accurate SNNs for sequence processing by modifying the ANN training before conversion, such that delays induced by ANN rollouts match the propagation delays in the targeted SNN implementation. Our method builds on the recently introduced framework of streaming rollouts, which aims for fully parallel model execution of ANNs and inherently allows for temporal integration by merging paths of different delays between input and output of the network. The resulting networks achieve state-of-the-art accuracy for multiple event-based benchmark datasets, including N-MNIST, CIFAR10-DVS, N-CARS, and DvsGesture, and through the use of spatio-temporal shortcut connections yield low-latency approximate network responses that improve over time as more of the input sequence is processed. In addition, our converted SNNs are consistently more energy-efficient than their corresponding ANNs.

show abstract

“…TSs represent the recent history of moving edges in a compact way (a 2D grid, also called motion history image in classical vision [48]) compared to other event representations [2], [49]. We use TSs because they are memory-and computationally efficient, informative, interpretable and because they have proven to be successful for motion (optical flow) [47], [50], [51] and depth estimation [21].…”

Section: A Event Representationmentioning

confidence: 99%