Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Humans possess an intrinsic ability to hide their true emotions. Micro-expressions are subtle changes in facial muscles that are involuntary by nature and easy to hide. To address these issues, several machine and deep learning models have been proposed in the past few years. Convolution neural network (CNN) is a deep learning method that has widely been adopted in vision-related tasks due to its remarkable performance. However, CNN suffers from overfitting due to a large number of trainable parameters. Additionally, CNN cannot capture global information with respect to an input image. Furthermore, the identification of important regions for the classification of micro-expressions is a challenging task. Selfattention mechanism addresses these issues by focusing on key areas. Furthermore, specific transformers, known as vision transformers are widely explored in vision-related applications. However, existing vision transformers divide an input image into a fixed number of patches due to which local correlation of image pixels is lost. Further, a vision transformer relies on self-attention mechanism which effectively captures global dependencies but does not exploit the local spatial relationships in an image. In this work, we propose a vision transformer based on convolution patches to overcome this problem. The proposed algorithm generates c number of feature maps from input images using c filters through convolution operation. These feature maps are then applied to a transformer model as fixed-size image patches to perform classification. Thus, the proposed architecture leverage advantages of both convolutional layers and transformer, and captures both spatial information and global dependencies respectively, leading to improved performance. The performance of the proposed model is evaluated on three benchmark datasets: CASME-I, CASME-II, and SAMM and compared with state-of-the-art machine and deep learning models, which generated classification accuracy of 95.97%, 98.59%, and 100%, respectively.INDEX TERMS Facial expression recognition, deep learning, micro-expression recognition, self-attention, vision transformer.
Humans possess an intrinsic ability to hide their true emotions. Micro-expressions are subtle changes in facial muscles that are involuntary by nature and easy to hide. To address these issues, several machine and deep learning models have been proposed in the past few years. Convolution neural network (CNN) is a deep learning method that has widely been adopted in vision-related tasks due to its remarkable performance. However, CNN suffers from overfitting due to a large number of trainable parameters. Additionally, CNN cannot capture global information with respect to an input image. Furthermore, the identification of important regions for the classification of micro-expressions is a challenging task. Selfattention mechanism addresses these issues by focusing on key areas. Furthermore, specific transformers, known as vision transformers are widely explored in vision-related applications. However, existing vision transformers divide an input image into a fixed number of patches due to which local correlation of image pixels is lost. Further, a vision transformer relies on self-attention mechanism which effectively captures global dependencies but does not exploit the local spatial relationships in an image. In this work, we propose a vision transformer based on convolution patches to overcome this problem. The proposed algorithm generates c number of feature maps from input images using c filters through convolution operation. These feature maps are then applied to a transformer model as fixed-size image patches to perform classification. Thus, the proposed architecture leverage advantages of both convolutional layers and transformer, and captures both spatial information and global dependencies respectively, leading to improved performance. The performance of the proposed model is evaluated on three benchmark datasets: CASME-I, CASME-II, and SAMM and compared with state-of-the-art machine and deep learning models, which generated classification accuracy of 95.97%, 98.59%, and 100%, respectively.INDEX TERMS Facial expression recognition, deep learning, micro-expression recognition, self-attention, vision transformer.
Mikro ifade (Mİ), insanların riskli bir ortamda bir olaya karşı istemsiz ve kontrolsüz duygusal tepkilerini gizlemeye çalıştıklarında ortaya çıkan sızıntıdır. Duyguyu yaşayan kişi risk altında bunu bastırmaya çalıştığı için yüze yansıması düşük yoğunlukta, belirli bir bölgede ve çok kısa sürede gerçekleşir. İfade istemsizce ortaya çıktığı için sahte değil tamamen doğal olmaktadır. Bu doğal ifadelerin doğru tespiti sayesinde adli, klinik, eğitim gibi birçok alanda etkili bir şekilde kullanılması sağlanabilir. Bu çalışmada Mİ tanıma hedefi için oluşturulan model yapısında sırasıyla önişleme, öznitelik çıkarma, öznitelik seçme ve sınıflandırma görevleri kullanılmıştır. Önerilen model yapısında literatürde en çok kullanılan, kamuya açık Mİ veri setlerinden CASME-II kullanılmıştır. Ön işleme aşamasında Optik Akış algoritmalarında kullanılmak üzere her bir video klipin görüntü dizisinden başlangıç (onset) ve tepe (apex) kareleri seçilir. Bu iki kare kullanılarak Farneback, TV-L1 Dual ve TV-L1 e ait yatay ve dikey optik akış görüntüleri elde edilmiş, ardından bu optik akış görüntüleri evrişimsel sinir ağı (ESA) modeli olan Xception ve geleneksel model olan Gabor modelleri kullanılarak görüntülere ait öznitelikler elde edilmiştir. Elde edilen bu özniteliklere ait ayırt edici olanları filtrelemek için çapraz doğrulama ile özyinelemeli özellik eleme (ÇDÖÖE) öznitelik seçim algoritması kullanılmıştır. Son olarak doğrusal destek vektör sınıflandırıcısı (DVS), filtrelenmiş Mİ özniteliklerini pozitif, negatif ve sürpriz olmak üzere üç sınıfa ayırmıştır. Önerilen Mİ model yapısından elde edilen sonuçlar 0.9248 doğruluk oranı başarısı göstermiştir.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.