2019
DOI: 10.48550/arxiv.1904.03116
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fast Weakly Supervised Action Segmentation Using Mutual Consistency

Abstract: Action segmentation is the task of predicting the actions in each frame of a video. As obtaining the full annotation of videos for action segmentation is expensive, weakly supervised approaches that can learn only from transcripts are appealing. In this paper, we propose a novel, end-to-end approach for weakly supervised action segmentation based on a two-branch neural network. The two branches of our network predict two redundant but different representations for action segmentation. We propose a novel mutual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 25 publications
0
10
0
Order By: Relevance
“…Finally, we apply the k-Means algorithm combined with the Silhouette Score to find the optimal number of clusters in which each cluster corresponds to Unsupervised 52.2 05 CDFL [22] Weakly Sup. 50.2 06 MuCon [37] Weakly Sup. 49.7 07 D3TW [6] Weakly Sup.…”
Section: Final Remarksmentioning
confidence: 99%
See 2 more Smart Citations
“…Finally, we apply the k-Means algorithm combined with the Silhouette Score to find the optimal number of clusters in which each cluster corresponds to Unsupervised 52.2 05 CDFL [22] Weakly Sup. 50.2 06 MuCon [37] Weakly Sup. 49.7 07 D3TW [6] Weakly Sup.…”
Section: Final Remarksmentioning
confidence: 99%
“…These subactions allow their model to learn fine-grained movements but still capture mid and longrange temporal information frames. Another very recent proposal, by Souri et al [37], utilizes a two-branch network where both try to predict the segmentation and to train it. They propose a novel mutual consistency loss (MuCon) to enforce consistency between the two predictions.…”
Section: Temporal Action Segmentationmentioning
confidence: 99%
See 1 more Smart Citation
“…Weakly supervised methods bypass per-frame annotations and use labels such as ordered lists of actions (Ding and Xu 2018;Richard et al 2018;Chang et al 2019;Li, Lei, and Todorovic 2019;Souri et al 2019) or a small percentage of action time-stamps (Kuehne, Richard, and Gall 2018;Li, Farha, and Gall 2021;Chen et al 2020a) for all videos.…”
Section: Related Workmentioning
confidence: 99%
“…While these approaches have been very successful, they suffer from a slow inference time as they iterate over all the training transcripts and select the one with the highest score. Souri et al [37] addressed this issue by predicting the transcript besides the frame-wise scores at inference time. While these approaches rely on a cheap transcript supervision, their performance is much worse than fully supervised approaches.…”
Section: Related Workmentioning
confidence: 99%