Unsupervised and Semi-Supervised Domain Adaptation for Action Recognition from Drones

Choi, Jin-Hwan; Sharma, Gaurav; Chandraker, Manmohan; Huang, Jia-Bin

doi:10.1109/wacv45572.2020.9093511

Cited by 61 publications

(40 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Task-specific: many aerial human detection datasets are aimed for specific tasks such as sport [23], search and rescue [94], synthetic data from game engines [132], and multiview [113]. AVI [127] is for violent recognition from aerial videos.…”

Section: Datasets For Aerial Action Recognitionmentioning

confidence: 99%

“…3D CNNs: 3D CNNs are still the most popular networks for aerial action recognition. Among the modern networks, I3D [19] has been widely adopted for aerial action recognition [23], [27], [80], [98], [132]. C3D [139] has also been utilized for aerial action recognition [24], [98].…”

Section: Two-stream Cnnsmentioning

confidence: 99%

“…Due to the high cost of capturing and labeling large scale aerial videos with diverse actions, the aerial action recognition community has addressed this challenge by domain adaptation [23] and knowledge distillation [32]. Choi et al [23] proposed domain adaptation approaches to leverage existing annotated action datasets and unannotated aerial videos.…”

Section: Lack Of Datamentioning

confidence: 99%

See 2 more Smart Citations

The State of Aerial Surveillance: A Survey

Nguyen¹,

Fookes²,

Sridharan³

et al. 2022

Preprint

View full text Add to dashboard Cite

The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-depth systematic review and technical analysis of the current state of aerial surveillance tasks using drones, UAVs and other airborne platforms. The main object of interest is humans, where single or multiple subjects are to be detected, identified, tracked, re-identified and have their behavior analyzed. More specifically, for each of these four tasks, we first discuss unique challenges in performing these tasks in an aerial setting compared to a ground-based setting. We then review and analyze the aerial datasets publicly available for each task, and delve deep into the approaches in the aerial literature and investigate how they presently address the aerial challenges. We conclude the paper with discussion on the missing gaps and open research questions to inform future research avenues.

show abstract

Section: Datasets For Aerial Action Recognitionmentioning

confidence: 99%

Section: Two-stream Cnnsmentioning

confidence: 99%

Section: Lack Of Datamentioning

confidence: 99%

See 1 more Smart Citation

The State of Aerial Surveillance: A Survey

Nguyen¹,

Fookes²,

Sridharan³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…recently. Early attempts align distributions across the source and target domains using hand-crafted features [3,67], while recent deep learning based methods [18,37,9,6,10,36] leverage the insight from UDA on image classification and extend it to the video case. For instance, approaches [6,37] utilize adversarial feature alignment [14,54] and propose a temporal version with attention modules.…”

Section: Related Workmentioning

confidence: 99%

“…Existing methods have made significant progress in imagebased tasks, such as classification [33,14,54,42], semantic segmentation [16,53,56,31,38] and object detection [8,43,24,17]. While several works have sought to extend this success to video-based tasks like action recognition by aligning appearance (e.g., RGB) features through adversarial learning [6,9,37], challenges persist in video adaptation tasks due to the greater complexity of the video data. Moreover, different from the image data, domain shifts in videos for action recognition often involve more complicated environments, which increases the difficulty for adaptation.…”

Section: Introductionmentioning

confidence: 99%

Learning Cross-modal Contrastive Features for Video Domain Adaptation

Kim¹,

Tsai²,

Zhuang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Learning transferable and domain adaptive feature representations from videos is important for video-relevant tasks such as action recognition. Existing video domain adaptation methods mainly rely on adversarial feature alignment, which has been derived from the RGB image space. However, video data is usually associated with multi-modal information, e.g., RGB and optical flow, and thus it remains a challenge to design a better method that considers the crossmodal inputs under the cross-domain adaptation setting. To this end, we propose a unified framework for video domain adaptation, which simultaneously regularizes cross-modal and cross-domain feature representations. Specifically, we treat each modality in a domain as a view and leverage the contrastive learning technique with properly designed sampling strategies. As a result, our objectives regularize feature spaces, which originally lack the connection across modalities or have less alignment across domains. We conduct experiments on domain adaptive action recognition benchmark datasets, i.e., UCF, HMDB, and EPIC-Kitchens, and demonstrate the effectiveness of our components against state-of-the-art algorithms.

show abstract