2022
DOI: 10.1109/tpami.2021.3089127
|View full text |Cite
|
Sign up to set email alerts
|

Fast Weakly Supervised Action Segmentation Using Mutual Consistency

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
18
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 37 publications
(20 citation statements)
references
References 33 publications
0
18
0
Order By: Relevance
“…Specifically, [9] encodes the entire video first before decoding it to frame-level action scores. The work in [4,6,27,38,47] use Dynamic Programming (DP) to infer the most likely actions and their duration given the entire video. Our method also uses a DPbased framework, but to our knowledge, we are the first to introduce a weakly-supervised method to segment a streaming video in an online manner.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Specifically, [9] encodes the entire video first before decoding it to frame-level action scores. The work in [4,6,27,38,47] use Dynamic Programming (DP) to infer the most likely actions and their duration given the entire video. Our method also uses a DPbased framework, but to our knowledge, we are the first to introduce a weakly-supervised method to segment a streaming video in an online manner.…”
Section: Related Workmentioning
confidence: 99%
“…Another important consideration in action understanding relates to requirements for processing the videos online versus offline, which is not addressed in existing weaklysupervised segmentation methods [6,27,47]. Online processing with low latency is an increasingly important part of interactive applications where real-time, or near real-time feedback is critical.…”
Section: Introductionmentioning
confidence: 99%
“…Two types of human activity recognition can be distinguished: (1) video data based, e.g. [44] and [39] and (2) inertial sensor data based activity recognition. An inertial sensor consists of at least an accelerometer and a gyroscope, but is often supplemented by a magnetometer.…”
Section: Related Workmentioning
confidence: 99%
“…They use a global length model for actions, which is updated during training. Souri et al [34] introduce an end-to-end method which does not use any decoding during training. They use a combination of a sequence-to-sequence model on top of a temporal convolutional network to learn the given transcript of actions while learning to temporally segment the video.…”
Section: Related Workmentioning
confidence: 99%
“…Since acquiring such annotations is very expensive, several works investigated methods to learn the models with less supervision. An example of weakly annotated training data are videos where only transcripts are provided [20,12,27,29,8,4,34,24]. While transcripts of videos can be obtained from scripts or subtitles, they are still costly to obtain.…”
Section: Introductionmentioning
confidence: 99%