Action Segmentation with Mixed Temporal Domain Adaptation

Chen, Min-Hung; Li, Baopu; Bao, Yingze; AlRegib, Ghassan

doi:10.1109/wacv45572.2020.9093535

Cited by 22 publications

(11 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One category of works focuses on the specific action recognition task that aims to classify a video clip into a particular category of human actions via temporal alignment [8], temporal attention [49,13], or self-supervised video representation learning [46,13]. Another category of works focus on action segmentation that simultaneously segments a video in time and classifies each segmented video clip with an action class via temporal alignment [9] or self-supervised video representation learning [10].…”

Section: Domain Adaptive Video Classificationmentioning

confidence: 99%

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Guan¹,

Huang²,

Xiao³

et al. 2021

Preprint

View full text Add to dashboard Cite

Video semantic segmentation is an essential task for the analysis and understanding of videos. Recent efforts largely focus on supervised video segmentation by learning from fully annotated data, but the learnt models often experience clear performance drop while applied to videos of a different domain. This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR) for consecutive frames of target-domain videos. DA-VSN consists of two novel and complementary designs. The first is cross-domain TCR that guides the prediction of target frames to have similar temporal consistency as that of source frames (learnt from annotated source data) via adversarial learning. The second is intra-domain TCR that guides unconfident predictions of target frames to have similar temporal consistency as confident predictions of target frames. Extensive experiments demonstrate the superiority of our proposed domain adaptive video segmentation network which outperforms multiple baselines consistently by large margins.

show abstract

Section: Domain Adaptive Video Classificationmentioning

confidence: 99%

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Guan¹,

Huang²,

Xiao³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Domain Adaptation for Videos. Prior works for video domain adaptation (DA) have focused on classification [6,11,28,42], segmentation [7,8] and localisation [2]. They use adversarial training to align the marginal distributions [28], an auxiliary self-supervised task [8,11,42], or attending to relevant frames alignment [6][7][8].…”

Section: Related Workmentioning

confidence: 99%

Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval

Munro¹,

Wray²,

Larlus³

et al. 2021

Preprint

View full text Add to dashboard Cite

Figure 1: Given video-text pairs (respectively denoted by circles and stars) from the source (blue), and a video-only target set (purple), we propose an alignment method to reduce the domain gap between the source videos and the target videos using pseudo-labels (Section C) and cross-domain ranking (Section 3.2). The learnt and aligned space can then be used for retrieving a ranked list of target videos using previously unseen text queries.

show abstract

“…However, not all the frame-level features contribute the same to the overall domain discrepancy. Therefore, inspired by [4,5], we assign larger attention weights to the features which have larger domain discrepancy so that we can focus more on aligning those features, achieving more effective domain adaptation.…”

Section: Technical Approach Detailsmentioning

confidence: 99%

“…Local SSTDA is necessary to calculate the attention weights for DATP. Without this mechanisms, frames will be aggregated in the same way as temporal pooling without cross-domain consideration, which is already demonstrated sub-optimal for cross-domain video tasks [4,5]. Domain Attentive Entropy (DAE): Minimum entropy regularization is a common strategy to perform more refined classifier adaptation.…”

Section: Technical Approach Detailsmentioning

confidence: 99%

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

Chen

Li²,

Bao³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Despite the recent progress of fully-supervised action segmentation techniques, the performance is still not fully satisfactory. One main challenge is the problem of spatiotemporal variations (e.g. different people may perform the same activity in various ways). Therefore, we exploit unlabeled videos to address this problem by reformulating the action segmentation task as a cross-domain problem with domain discrepancy caused by spatio-temporal variations. To reduce the discrepancy, we propose Self-Supervised Temporal Domain Adaptation (SSTDA), which contains two self-supervised auxiliary tasks (binary and sequential domain prediction) to jointly align cross-domain feature spaces embedded with local and global temporal dynamics, achieving better performance than other Domain Adaptation (DA) approaches. On three challenging benchmark datasets (GTEA, 50Salads, and Breakfast), SSTDA outperforms the current state-of-the-art method by large margins (e.g. for the F1@25 score, from 59.6% to 69.1% on Breakfast, from 73.4% to 81.5% on 50Salads, and from 83.6% to 89.1% on GTEA), and requires only 65% of the labeled training data for comparable performance, demonstrating the usefulness of adapting to unlabeled target videos across variations. The source code is available at https://github.com/cmhungsteve/SSTDA.

show abstract

Action Segmentation with Mixed Temporal Domain Adaptation

Cited by 22 publications

References 37 publications

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation

Contact Info

Product

Resources

About