2019 IEEE Winter Conference on Applications of Computer Vision (WACV) 2019
DOI: 10.1109/wacv.2019.00027
|View full text |Cite
|
Sign up to set email alerts
|

Coupled Generative Adversarial Network for Continuous Fine-Grained Action Segmentation

Abstract: We propose a novel conditional GAN (cGAN) model for continuous fine-grained human action segmentation, that utilises multi-modal data and learned scene context information. The proposed approach utilises two GANs: termed Action GAN and Auxiliary GAN, where the Action GAN is trained to operate over the current RGB frame while the Auxiliary GAN utilises supplementary information such as depth or optical flow. The goal of both GANs is to generate similar 'action codes', a vector representation of the current acti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 23 publications
(22 citation statements)
references
References 46 publications
0
22
0
Order By: Relevance
“…cGAN [9] utilizes supplementary modalities including depth maps and optical flow with an auxiliary network. cGAN outperforms MS-TCN in terms of the F1 score and edit score on the 50Salads dataset.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…cGAN [9] utilizes supplementary modalities including depth maps and optical flow with an auxiliary network. cGAN outperforms MS-TCN in terms of the F1 score and edit score on the 50Salads dataset.…”
Section: Resultsmentioning
confidence: 99%
“…Since we use Multi-stage TCN [7] as our baseline model and develop our approaches GTEA F1@{10, 25, 50} Edit Acc ST-CNN [18] 58.7 54.4 41.9 49.1 60.6 Bi-LSTM [34] 66.5 59.0 43.6 -55.5 ED-TCN [17] 72.2 69.3 56.0 -64.0 TricorNet [4] 76.0 71.1 59.2 -64.8 TDRN [19] 79.2 74.4 62.7 74.1 70.1 cGAN [9] 80.1 77.9 69.1 78.1 78.5 MS-TCN [7] 85.8 83.4 69.8 79.0 76.3 MS-TCN (FT) [7] 87. 5 F1@{10, 25, 50} Edit Acc IDT+LM [31] 44.4 38.9 27.8 45.8 48.7 Bi-LSTM [34] 62.6 58.3 47.0 55.6 55.7 ST-CNN [18] 55.9 49.6 37.1 45.9 59.4 ED-TCN [17] 68.0 63.9 52.6 59.8 64.7 TricorNet [4] 70.1 67.2 56.6 62.8 67.5 TDRN [19] 72.9 68.5 57.2 66.0 68.1 MS-TCN [7] 76.3 74.0 64.5 67.9 80.7 cGAN [9] 80.1 78. because of two reasons: 1) S1 corresponds to low-level and transferable features with less discriminability where DA shows limited effects [22]. 2) S1 capture less temporal information from neighbor frames, representing less temporal receptive fields, which is critical for action segmentation.…”
Section: Ablation Study and Analysismentioning
confidence: 99%
“…Tricornet [8] utilizes a hybrid temporal convolutional and recurrent network to capture local motion and memorize long-term action dependencies. Cou-pledGAN [17] uses a GAN model to utilize multi-modal data to better model human actions' evolution. Capturing long-short term information with multiple streams increases the computational redundancy.…”
Section: Action Segmentationmentioning
confidence: 99%
“…Human action recognition approaches can be categorised into two types: methods that are discrete and operate on either image [6] or pre-segmented videos [7,8,9]; and methods that operate over continuous fine-grained action videos [10,11,12]. Even though discrete methods have demonstrated greater performance [13,14], they are disconnected from real world scenarios that are always composed of fine-grained actions.…”
Section: Related Workmentioning
confidence: 99%