2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.435
|View full text |Cite
|
Sign up to set email alerts
|

Deep Quantization: Encoding Convolutional Activations with Deep Generative Model

Abstract: Deep convolutional neural networks (CNNs) have proven highly effective for visual recognition, where learning a universal representation from activations of convolutional layer plays a fundamental problem. In this paper, we present Fisher Vector encoding with Variational AutoEncoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner. To incorporate FV encoding strategy into deep generative models, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
47
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 62 publications
(47 citation statements)
references
References 40 publications
0
47
0
Order By: Relevance
“…A similar trend is also observed in the classification setting. Our method achieves comparable performance to the state-of-the-art video modeling approach such as fv-vae [31]. Note tsn [42] is fully supervised thus not directly comparable.…”
Section: Results On Action Recognitionmentioning
confidence: 90%
“…A similar trend is also observed in the classification setting. Our method achieves comparable performance to the state-of-the-art video modeling approach such as fv-vae [31]. Note tsn [42] is fully supervised thus not directly comparable.…”
Section: Results On Action Recognitionmentioning
confidence: 90%
“…In [34], the famous two-stream architecture is devised by applying two 2D CNN architectures separately on visual frames and staked optical flows. This two-stream architecture is further extended by exploiting convolutional fusion [5], spatio-temporal attention [24], temporal segment networks [41,42] and convolutional encoding [4,27] for video representation learning. Ng et al [49] highlight the drawback of performing 2D CNN on video frames, in which long-term dependencies cannot be captured by two-stream network.…”
Section: Related Workmentioning
confidence: 99%
“…CNN based Semantic Image Segmentation. Inspired by the success of CNN on visual recognition [12,13,25,26,28,29,30], recently researchers have proposed various CNN based approaches for semantic segmentation. The typical way of applying CNNs to segmentation is through patch-by-patch scanning [9,23].…”
Section: Related Workmentioning
confidence: 99%