Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition

Vu, Duc-Quang; Le, Ngan; Wang, Jia-Ching

doi:10.1109/access.2021.3099856

Cited by 30 publications

(19 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The consistent loss, e.g., KL divergence, works as an efficient way to distill the knowledge in the trained black-box model. In addition, a previous study of the knowledge distillation in a single domain (Guo et al, 2020 ; Vu et al, 2021 ) also has shown that the student model learned with the distillation can be more general (Wang et al, 2021 ). Thus, our framework can be a viable solution to train a target domain model with a decent generalization ability.…”

Section: Discussionmentioning

confidence: 98%

Unsupervised Black-Box Model Domain Adaptation for Brain Tumor Segmentation

et al. 2022

View full text Add to dashboard Cite

Unsupervised domain adaptation (UDA) is an emerging technique that enables the transfer of domain knowledge learned from a labeled source domain to unlabeled target domains, providing a way of coping with the difficulty of labeling in new domains. The majority of prior work has relied on both source and target domain data for adaptation. However, because of privacy concerns about potential leaks in sensitive information contained in patient data, it is often challenging to share the data and labels in the source domain and trained model parameters in cross-center collaborations. To address this issue, we propose a practical framework for UDA with a black-box segmentation model trained in the source domain only, without relying on source data or a white-box source model in which the network parameters are accessible. In particular, we propose a knowledge distillation scheme to gradually learn target-specific representations. Additionally, we regularize the confidence of the labels in the target domain via unsupervised entropy minimization, leading to performance gain over UDA without entropy minimization. We extensively validated our framework on a few datasets and deep learning backbones, demonstrating the potential for our framework to be applied in challenging yet realistic clinical settings.

show abstract

Section: Discussionmentioning

confidence: 98%

Unsupervised Black-Box Model Domain Adaptation for Brain Tumor Segmentation

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Contrarily to the above methods that leverage multi-scale spatiotemporal information, in [69] a dynamic equilibrium module is inserted into a 3D-CNN backbone to directly suppress the influence of spatiotemporal variations of actions in video. In another line of research, in [70], a self-knowledge distillation approach is used to boost the performance of baseline 3D-CNN models (3D ResNet-18 and -50) for the task of action recognition.…”

Section: ) Top-down Approachesmentioning

confidence: 99%

ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network

2022

View full text Add to dashboard Cite

In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object detector together with a Vision Transformer (ViT) backbone network to derive object and frame features, and a head network to process these features for the task of event recognition and explanation in video, is proposed. The ViGAT head consists of graph attention network (GAT) blocks factorized along the spatial and temporal dimensions in order to capture effectively both local and long-term dependencies between objects or frames. Moreover, using the weighted in-degrees (WiDs) derived from the adjacency matrices at the various GAT blocks, we show that the proposed architecture can identify the most salient objects and frames that explain the decision of the network. A comprehensive evaluation study is performed, demonstrating that the proposed approach provides state-of-the-art results on three large, publicly available video datasets (FCVID, MiniKinetics, ActivityNet) a .a Source code and trained models will be made available upon acceptance.INDEX TERMS Video event recognition, eXplainable AI (XAI), graph attention network, factorized attention, bottom-up.

show abstract

“…Ngày nay, các mô hình mạng nơron tích chập Convolutional Neural Network (CNN) ngày càng đạt được nhiều thành tựu nổi bật trong các bài toán về thị giác máy tính và xử lý ảnh như phân lớp ảnh [4], [5], phát hiện đối tượng trong ảnh [6], [7]. Các mô hình CNN nhẹ cũng được quan tâm với nhiều biến thể khác nhau như [8]- [11] nhằm mục đích cho phép các mô hình có thể triển khai trên các thiết bị di động, thiết bị nhúng trong thời gian thực. Trong bài báo này, chúng tôi giới thiệu một mô hình hiện đại dựa trên học sâu để giải quyết bài toán.…”

Section: Giới Thiệuunclassified

Xây Dựng Mô Hình Học Sâu Nhận Dạng Người Không Đeo Khẩu Trang

Ma¹,

Ánh²

2022

TNUJST

View full text Add to dashboard Cite

Hiện nay, dịch Covid-19 đã và đang gây ra những ảnh hưởng không nhỏ đến sức khỏe, kinh tế và xã hội ở nhiều nước trên thế giới cũng như Việt Nam. Đây là mối quan tâm hàng đầu của WHO cũng như các trung tâm kiểm dịch của các quốc gia. Do đó, nhận dạng người không đeo khẩu trang là một trong những yếu tố tiên quyết để phòng chống sự lây lan của virus. Trong bài báo này chúng tôi trình bày một hệ thống nhận dạng người không đeo khẩu trang trong thời gian thực dựa trên học sâu. Hệ thống của chúng tôi bao gồm hai mô hình chính là mô hình RetinaFace và CNN nhẹ. Mô hình RetinaFace có nhiệm vụ trích xuất ra khuôn mặt từ dữ liệu đầu vào là camera. Mô hình CNN nhẹ được đề xuất nhằm nhận dạng người không đeo khẩu trang từ khuôn mặt được trích xuất từ mô hình RetinaFace. Kết quả thử nghiệm cho thấy mô hình CNN nhẹ đạt hiệu suất 96,88% độ chính xác trên tập dữ liệu thử nghiệm. Bên cạnh đó, so sánh với các hệ thống hiện có trên thực tế, hệ thống được chúng tôi đề xuất có nhiều ưu điểm hơn về độ chính xác, thời gian phân tích và trả kết quả, chi phí xây dựng, bảo trì hệ thống.

show abstract

Teaching Yourself: A Self-Knowledge Distillation Approach to Action Recognition

Cited by 30 publications

References 39 publications

Unsupervised Black-Box Model Domain Adaptation for Brain Tumor Segmentation

Unsupervised Black-Box Model Domain Adaptation for Brain Tumor Segmentation

ViGAT: Bottom-Up Event Recognition and Explanation in Video Using Factorized Graph Attention Network

Xây Dựng Mô Hình Học Sâu Nhận Dạng Người Không Đeo Khẩu Trang

Contact Info

Product

Resources

About