2020
DOI: 10.1007/978-3-030-58607-2_8
|View full text |Cite
|
Sign up to set email alerts
|

RhyRNN: Rhythmic RNN for Recognizing Events in Long and Complex Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 51 publications
0
6
0
Order By: Relevance
“…Timception [18], VideoGraph [19] and RhyRNN [54] Furthermore, Table 10 shows the original reported results of Timeception and VideoGraph, which are lower than our re-implemented versions in both cases. Contrary to the standard splitting rule of the Breakfast dataset, both works have used the last 0.15% of subjects in the dataset (8 subjects) to test their performance.…”
Section: Task Classification Results On 10 Classes Of the Breakfast D...mentioning
confidence: 90%
See 2 more Smart Citations
“…Timception [18], VideoGraph [19] and RhyRNN [54] Furthermore, Table 10 shows the original reported results of Timeception and VideoGraph, which are lower than our re-implemented versions in both cases. Contrary to the standard splitting rule of the Breakfast dataset, both works have used the last 0.15% of subjects in the dataset (8 subjects) to test their performance.…”
Section: Task Classification Results On 10 Classes Of the Breakfast D...mentioning
confidence: 90%
“…In the scope of activity recognition, most works [13,20,53] study short-range or trimmed videos. Our work is closest to [18,19,54], where the focus is recognizing minuteslong activities. However, unlike them, our paper is on instructional videos, and on how recognition can aid segmentation, so it relies on hierarchical activity labels (top-level task, lower-level attributes as targets for segmentation).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Independence or modular property serves as strong regularization or prior in some learning tasks under static setting (Wang et al, 2020;Liu et al, 2020). In the sequential case, some early attempts over RNNs emphasized implicit "independence" in the feature space between dimensions or channels (Li et al, 2018;Yu et al, 2020). As independence assumption commonly holds in vision tasks (with distinguishable objects), Pang et al (2020); Li et al (2020b) proposed video understanding schemes by decoupling the spatiotemporal patterns.…”
Section: Related Workmentioning
confidence: 99%
“…Latent entities may not have exact physical meanings, but learning procedures can greatly benefit from such decoupling, as this assumption can be viewed as strong regularization to the system. This assumption has been successfully incorporated in several models for learning from regularly sampled sequential data by emphasizing "independence" to some extent between channels or groups in the feature space (Li et al, 2018;Yu et al, 2020;Goyal et al, 2021;Madan et al, 2021). Another successful counterpart in parallel benefiting from this assumption is transformer (Vaswani et al, 2017) which stacks multiple layers of self-attention and point-wise feedforward networks.…”
Section: Introductionmentioning
confidence: 99%