2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00632
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition

Abstract: Video Recognition has drawn great research interest and great progress has been made. A suitable frame sampling strategy can improve the accuracy and efficiency of recognition. However, mainstream solutions generally adopt handcrafted frame sampling strategies for recognition. It could degrade the performance, especially in untrimmed videos, due to the variation of frame-level saliency. To this end, we concentrate on improving untrimmed video classification via developing a learning-based frame sampling strate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
126
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 137 publications
(130 citation statements)
references
References 54 publications
3
126
0
Order By: Relevance
“…NetVlad [1], ActionVlad [8], AttentionClusters [19] are proposed for better local feature integration instead of directly average pooling as used. MARL [31] uses multiple agents as frame selectors instead of the general uniform sampler from the entire video for better global temporal modeling. Each agent learns a exibly moving policy through the temporal axis to get a vital representation frame and other agents' behavior as well.…”
Section: Long-term Modeling For Video Recognitionmentioning
confidence: 99%
“…NetVlad [1], ActionVlad [8], AttentionClusters [19] are proposed for better local feature integration instead of directly average pooling as used. MARL [31] uses multiple agents as frame selectors instead of the general uniform sampler from the entire video for better global temporal modeling. Each agent learns a exibly moving policy through the temporal axis to get a vital representation frame and other agents' behavior as well.…”
Section: Long-term Modeling For Video Recognitionmentioning
confidence: 99%
“…However, there are few for activity recognition especially for skeleton-based data. In [22], multiagent reinforcement learning is used to select key frames in videos where each agent is responsible for selecting a frame. As a result, the number of selected frames is fixed.…”
Section: B Reinforcement Learning In Activity Recognitionmentioning
confidence: 99%
“…For action recognition, Dong et al [ 62 ] proposed an attention-aware sampling agent based on deep reinforcement learning to select the most discriminative frame in the inference step to improve performance. Wu et al [ 63 ] proposed a frame sampling agent based on multiagent reinforcement learning to drop non-informative frames of untrimmed video. Zheng et al [ 64 ] used reinforcement learning agents to select effective segments for inference.…”
Section: Related Workmentioning
confidence: 99%