Discriminative Video Representation Learning Using Support Vector Classifiers

Wang, Jue; Cherian, Anoop

doi:10.1109/tpami.2019.2937292

Cited by 8 publications

(6 citation statements)

References 85 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(%) I3D (Carreira & Zisserman, 2017) 80.9 Disc. Pool (Wang & Cherian, 2019) 81.3 DSP (Wang & Cherian, 2018) 81.5 Ours (I3D+full model) 81.8…”

Section: Methodsmentioning

confidence: 99%

“…In this paper, we generalize this pooling for richer and better representation learning. While, we can easily train for the two losses L C and L R jointly in an end-to-end manner (Wang & Cherian, 2019), in this work, we deal with them separately so that we have better control of each of them. In the next few sections, we look deeper into the representation loss using a contrastive learning framework.…”

Section: Problem Formulationmentioning

confidence: 99%

“…Our work is also related to discriminative pooling (Wang & Cherian, 2019) that proposes to generate negative samples via passing random noise through an image-trained CNN, however is not adversarial. In (Wang & Cherian, 2018), the authors propose an adversarial setup in a discriminative representation learning framework, however uses a deterministic deep model to learn a single adversarial sample per data point.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Representation Learning via Adversarially-Contrastive Optimal Transport

Cherian¹,

Aeron²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper, we study the problem of learning compact (low-dimensional) representations for sequential data that captures its implicit spatiotemporal cues. To maximize extraction of such informative cues from the data, we set the problem within the context of contrastive representation learning and to that end propose a novel objective via optimal transport. Specifically, our formulation seeks a low-dimensional subspace representation of the data that jointly (i) maximizes the distance of the data (embedded in this subspace) from an adversarial data distribution under the optimal transport, a.k.a. the Wasserstein distance, (ii) captures the temporal order, and (iii) minimizes the data distortion. To generate the adversarial distribution, we propose a novel framework connecting Wasserstein GANs with a classifier, allowing a principled mechanism for producing good negative distributions for contrastive learning, which is currently a challenging problem. Our full objective is cast as a subspace learning problem on the Grassmann manifold and solved via Riemannian optimization. To empirically study our formulation, we provide experiments on the task of human action recognition in video sequences. Our results demonstrate competitive performance against challenging baselines.

show abstract

“…(%) I3D (Carreira & Zisserman, 2017) 80.9 Disc. Pool (Wang & Cherian, 2019) 81.3 DSP (Wang & Cherian, 2018) 81.5 Ours (I3D+full model) 81.8…”

Section: Methodsmentioning

confidence: 99%

Section: Problem Formulationmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Representation Learning via Adversarially-Contrastive Optimal Transport

Cherian¹,

Aeron²

2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Aiming at the defects of linear weighting schemes that lack concerning features, Wang, Xiong [22] proposed an adaptive weighting method to automatically assign weights to clip-level results. Wang and Cherian [24] introduced the concept of a positive bag and a negative bag to find useful features. In our approach, the judgement of confidence through analyzing the form of the category probabilities is performed, then weights for each clip-level result are determined by confidence scores.…”

Section: Related Workmentioning

confidence: 99%

“…They are not well-suited for evaluating unequal discrimination of each clip. Some complicated aggregation methods [21][22][23][24] have also been proposed; for example, in study [23], a recurrent neural network (RNN) was designed to yield video-level scores. However, confidence of clip-level results is not well considered in these methods.…”

Section: Introductionmentioning

confidence: 99%

Whole-Body Keypoint and Skeleton Augmented RGB Networks for Video Action Recognition

Guo

Ying

2022

Applied Sciences

View full text Add to dashboard Cite

Incorporating multi-modality data is an effective way to improve action recognition performance. Based on this idea, we investigate a new data modality in which Whole-Body Keypoint and Skeleton (WKS) labels are used to capture refined body information. Unlike directly aggregated multi-modality, we leverage distillation to adapt an RGB network to classify action with the feature-extraction ability of the WKS network, which is only fed with RGB clips. Inspired by the success of transformers for vision tasks, we design an architecture that takes advantage of both three-dimensional (3D) convolutional neural networks (CNNs) and the Swin transformer to extract spatiotemporal features, resulting in advanced performance. Furthermore, considering the unequal discrimination among clips of a video, we also present a new method for aggregating the clip-level classification results, further improving the performance. The experimental results demonstrate that our framework achieves advanced accuracy of 93.4% with only RGB input on the UCF-101 dataset.

show abstract

Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement

Ma,

2024

Machine Vision and Applications

View full text Add to dashboard Cite

Discriminative Video Representation Learning Using Support Vector Classifiers

Cited by 8 publications

References 85 publications

Representation Learning via Adversarially-Contrastive Optimal Transport

Representation Learning via Adversarially-Contrastive Optimal Transport

Whole-Body Keypoint and Skeleton Augmented RGB Networks for Video Action Recognition

Tackling confusion among actions for action segmentation with adaptive margin and energy-driven refinement

Contact Info

Product

Resources

About