2022
DOI: 10.48550/arxiv.2204.01692
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Long Movie Clip Classification with State-Space Video Models

Abstract: Most modern video recognition models are designed to operate on short video clips (e.g., 5-10s in length). Because of this, it is challenging to apply such models to long movie understanding tasks, which typically require sophisticated long-range temporal reasoning capabilities. The recently introduced video transformers partially address this issue by using long-range temporal self-attention. However, due to the quadratic cost of self-attention, such models are often costly and impractical to use. Instead, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 47 publications
0
2
0
Order By: Relevance
“…Recently, a new class of sequence models based on state space models (SSMs) [30,34,37,46] has emerged as a powerful general-purpose sequence modeling framework. SSMs scale nearly linearly in sequence length and have shown state-of-the-art performance on a range of sequence modeling tasks, from long range modeling [68] to language modeling [17,50], computer vision [39,53], and medical analysis [70].…”
Section: Introductionmentioning
confidence: 99%
“…Recently, a new class of sequence models based on state space models (SSMs) [30,34,37,46] has emerged as a powerful general-purpose sequence modeling framework. SSMs scale nearly linearly in sequence length and have shown state-of-the-art performance on a range of sequence modeling tasks, from long range modeling [68] to language modeling [17,50], computer vision [39,53], and medical analysis [70].…”
Section: Introductionmentioning
confidence: 99%
“…Movie genre classification is a fundamental task for certain downstream tasks such as movie recommendation [12], understanding [25], editing [9], description [39], etc. Previous studies [16,53] have achieved unparalleled results in movie genre classification with a single modality such as posters, plot summaries, movie trailers, audio, or metadata.…”
Section: Introductionmentioning
confidence: 99%