2021
DOI: 10.1109/tpami.2019.2937292
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative Video Representation Learning Using Support Vector Classifiers

Abstract: Most popular deep models for action recognition in videos generate independent predictions for short clips, which are then pooled heuristically to assign an action label to the full video segment. As not all frames may characterize the underlying action-indeed, many are common across multiple actions-pooling schemes that impose equal importance on all frames might be unfavorable. In an attempt to tackle this problem, we propose discriminative pooling, based on the notion that among the deep features generated … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 85 publications
0
6
0
Order By: Relevance
“…(%) I3D (Carreira & Zisserman, 2017) 80.9 Disc. Pool (Wang & Cherian, 2019) 81.3 DSP (Wang & Cherian, 2018) 81.5 Ours (I3D+full model) 81.8…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…(%) I3D (Carreira & Zisserman, 2017) 80.9 Disc. Pool (Wang & Cherian, 2019) 81.3 DSP (Wang & Cherian, 2018) 81.5 Ours (I3D+full model) 81.8…”
Section: Methodsmentioning
confidence: 99%
“…In this paper, we generalize this pooling for richer and better representation learning. While, we can easily train for the two losses L C and L R jointly in an end-to-end manner (Wang & Cherian, 2019), in this work, we deal with them separately so that we have better control of each of them. In the next few sections, we look deeper into the representation loss using a contrastive learning framework.…”
Section: Problem Formulationmentioning
confidence: 99%
See 1 more Smart Citation
“…Aiming at the defects of linear weighting schemes that lack concerning features, Wang, Xiong [22] proposed an adaptive weighting method to automatically assign weights to clip-level results. Wang and Cherian [24] introduced the concept of a positive bag and a negative bag to find useful features. In our approach, the judgement of confidence through analyzing the form of the category probabilities is performed, then weights for each clip-level result are determined by confidence scores.…”
Section: Related Workmentioning
confidence: 99%
“…They are not well-suited for evaluating unequal discrimination of each clip. Some complicated aggregation methods [21][22][23][24] have also been proposed; for example, in study [23], a recurrent neural network (RNN) was designed to yield video-level scores. However, confidence of clip-level results is not well considered in these methods.…”
Section: Introductionmentioning
confidence: 99%