2015
DOI: 10.48550/arxiv.1503.04144
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
25
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 29 publications
(29 citation statements)
references
References 26 publications
3
25
0
Order By: Relevance
“…While the two-stream model also has the advantage of being trained specifically on a video dataset, we observe that the learned representations do not transfer favorably to the MED11 dataset in contrast to fc7 and fc6 features trained on ImageNet. A similar observation was made in [38,41], where simple CNN features trained from ImageNet consistently provided the best results.…”
Section: Event Retrievalsupporting
confidence: 73%
See 2 more Smart Citations
“…While the two-stream model also has the advantage of being trained specifically on a video dataset, we observe that the learned representations do not transfer favorably to the MED11 dataset in contrast to fc7 and fc6 features trained on ImageNet. A similar observation was made in [38,41], where simple CNN features trained from ImageNet consistently provided the best results.…”
Section: Event Retrievalsupporting
confidence: 73%
“…Deep network features learned from spatial data [8,12,30] and temporal flow [30] have also shown comparable results. However, recent works in complex event recognition [38,41] have shown that spatial Convolutional Neural Network (CNN) features learned from ImageNet [2] without fine-tuning on video, accompanied by suitable pooling and encoding strategies achieves state-of-the-art performance. In contrast to these methods which either propose handcrafted features or learn feature representations with a fully supervised objective from images or videos, we try to learn an embedding in an unsupervised fashion.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In conjunction with other types of lay- These recent advancements in machine learning have led to "Deep learning", an extension dealing with deeper neural networks (DNNs), particularly Deep CNNs. For these models, image classification remains one of the most popular and robust tasks [35,15], tasking the DNN with recognizing patterns in computer vision and speech. Classification of images stands frequently as a benchmark for newly developed architectures and data augmentation methods [32,26,12,7].…”
Section: Introductionmentioning
confidence: 99%
“…PN uses an element-wise power operation to discount large values and increase small values of video representations. As one of the most significant improvements in the past few years, this simple algorithm essentially makes Fisher Vectors and VLADs useful in practice, and has been widely adopted by the research community to both handcrafted [17,27] and deeply-learned features [28,29]. However, PN can only alleviate the sparse and bursty distribution problems.…”
Section: Introductionmentioning
confidence: 99%