ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683606
|View full text |Cite
|
Sign up to set email alerts
|

Neuromorphic Vision Sensing for CNN-based Action Recognition

Abstract: Neuromorphic vision sensing (NVS) hardware is now gaining traction as a low-power/high-speed visual sensing technology that circumvents the limitations of conventional active pixel sensing (APS) cameras. While object detection and tracking models have been investigated in conjunction with NVS, there is currently little work on NVS for higher-level semantic tasks, such as action recognition. Contrary to recent work that considers homogeneous transfer between flow domains (optical flow to motion vectors), we pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 17 publications
(15 citation statements)
references
References 17 publications
0
15
0
Order By: Relevance
“…Similary, Ghosh et al partitioned events into a three-dimensional grid of voxels where spatio-temporal filters are used to learn the features, and learnt features are fed as input to CNNs for action recognition [16]. Chadha et al [31] generated frames by summing the polarity of events in each address as pixel, then fed them into a multi-modal teacher-student framework for action recognition. While useful for early-stage attempts, these frame-based methods are not well-suited for the neuromorphic event's sparse and asynchronous nature since the frame sizes that need to be processed are substantially larger than those of the original NVS streams.…”
Section: Related Workmentioning
confidence: 99%
“…Similary, Ghosh et al partitioned events into a three-dimensional grid of voxels where spatio-temporal filters are used to learn the features, and learnt features are fed as input to CNNs for action recognition [16]. Chadha et al [31] generated frames by summing the polarity of events in each address as pixel, then fed them into a multi-modal teacher-student framework for action recognition. While useful for early-stage attempts, these frame-based methods are not well-suited for the neuromorphic event's sparse and asynchronous nature since the frame sizes that need to be processed are substantially larger than those of the original NVS streams.…”
Section: Related Workmentioning
confidence: 99%
“…They demonstrate the usability of event data in HAR compared to conventional camerabased vision systems, where complex optic flow estimation is required. A similar approach was followed in [40] by using two CNNs to learn features from event frames and corresponding optic flow from the original RGB. Converting the events into frames is also applied in [43] to classify the actions of the neuromorphic version of UCF11 dataset.…”
Section: Related Workmentioning
confidence: 99%
“…The existing work that uses hand-crafted features has achieved an accuracy rate of 75.13% [39], while the works that have used deep learning have achieved accuracy rates ranging from 51.5% to 92.9% [40]- [43] as detailed in Section IV-B. It can be also observed that the accuracy rates of these methods depend on the quality of the constructed event frames.…”
Section: Related Workmentioning
confidence: 99%
“…As to the number of parameters, for each convolution layer in both CNNs and GCNs, it is (C in K 2 + 1)C out , while in fully connected layers, it is (C in + 1)C out . As shown by (11), FLOPs of graph convolution depend on the number of edges and nodes. Since the size of input graph varies per dataset, we opt to report representative results from N-Caltech101 in Table 2.…”
Section: Object Classificationmentioning
confidence: 99%