Robust Facial Pose Estimation Using Landmark Selection Method for Binocular Stereo Vision

Park, Jae-Seong; Heo, Suwoong; Lee, Kyungjune; Song, H. C.; Lee, Sanghoon

doi:10.1109/icip.2018.8451443

Cited by 6 publications

(5 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zhu et al [23] attempted to solve the problem of estimating pose under challenging situations using a CNN network and achieved significant improvements over state-of-the-art methods. In [24], the authors presented a robust facial pose estimation technique based on landmarks but only on those predicted with a high confidence score. CNNs were used to measure this score, and then erroneous ones were removed.…”

Section: Related Workmentioning

confidence: 99%

A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events

Vrochidis,

Dimitriou,

Krinidis

et al. 2024

Int J Comput Intell Syst

View full text Add to dashboard Cite

This paper introduces a deep learning methodology for analyzing audience engagement in online video events. The proposed deep learning framework consists of six layers and starts with keyframe extraction from the video stream and the participants’ face detection. Subsequently, the head pose and emotion per participant are estimated using the HopeNet and JAA-Net deep architectures. Complementary to video analysis, the audio signal is also processed using a neural network that follows the DenseNet-121 architecture. Its purpose is to detect events related to audience engagement, including speech, pauses, and applause. With the combined analysis of video and audio streams, the interest and attention of each participant are inferred more accurately. An experimental evaluation is performed on a newly generated dataset consisting of recordings from online video events, where the proposed framework achieves promising results. Concretely, the F1 scores were 79.21% for interest estimation according to pose, 65.38% for emotion estimation, and 80% for sound event detection. The proposed framework has applications in online educational events, where it can help tutors assess audience engagement and comprehension while hinting at points in their lectures that may require further clarification. It is effective for video streaming platforms that want to provide video recommendations to online users according to audience engagement.

show abstract

Section: Related Workmentioning

confidence: 99%

A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events

Vrochidis,

Dimitriou,

Krinidis

et al. 2024

Int J Comput Intell Syst

View full text Add to dashboard Cite

show abstract

“…Of these methods, the most convenient one is the use of camera, which can be divided into monocular vision and binocular vision measurement. 25 Compared with monocular vision, 26 binocular vision has one more distance information, but it inevitably involves camera calibration. If the camera's internal parameters are not accurate, the measurement error will be out of control.…”

Section: Related Workmentioning

confidence: 99%

Deep learning based six‐dimensional pose estimation in virtual reality

Yang

Lei

Tian

et al. 2021

Computational Intelligence

View full text Add to dashboard Cite

Virtual reality technology, with its continuous development, is gradually applied to healthcare, education, business, and other fields. In the application of the technology, position and attitude estimation, as a space positioning technology, is indispensable. Traditional pose estimation has the problems of high dependence on environment and great complexity. But convolutional neural network (CNN) and other technologies with computational intelligence provide a strong guarantee for the progress of pose estimation. This article, based on the theory of CNN in deep learning, as well as monocular vision system and target sample set with markers, proposes a method for estimation of target position and attitude, and at the same time, describes in detail a general way of making dataset with markers based on simulation environment. In this article, the comparative experiments of different network structures show that this measurement method can avoid manual extraction of complex image features, and realize fast, arbitrary and accurate measurement, which plays a key role in pose and attitude measurement. Moreover, the visual correspondence between the world coordinate system and the pixel coordinate system is proved effectively by quaternion.

show abstract

“…, C, and w k (x, y) represent the global illumination, k th local SH illumination, a set of clusters, and their contributions to (x, y), respectively. Note that each illumination follows the SH lighting formula in (2). Unlike in [24], we compute each region for local SH using a pixel clustering method, such as simple linear iterative clustering (SLIC) [38], in the luminance domain.…”

Section: A Local Sh Modelmentioning

confidence: 99%

“…T He analysis of facial geometry and appearance is a classical problem and its applications are related to many computer vision and graphics tasks such as face recognition [1], pose estimation [2]- [4], and facial animation [5]. 3D face reconstruction, which is the process of inferring the 3D geometry of a human face from 2D images, is the most very fundamental core that powers those applications.…”

Section: Introductionmentioning

confidence: 99%

Local Spherical Harmonics for Facial Shape and Albedo Estimation

et al. 2020

Self Cite

View full text Add to dashboard Cite

In this paper, we present a novel facial albedo and 3D shape recovery method with a local spherical harmonic illumination model. From a face in an image, the proposed method can produce a highquality 3D shape and albedo using a novel parameterization of local illuminations. Because a facial shape is partially convex, a single spherical harmonics is generally used for the illumination of a face within a constrained illumination environment. However, when a facial image is captured in an unconstrained scene, the illumination is inappropriately estimated due to the presence of shadow and specular reflections. To address this issue, we propose a novel local spherical harmonic illumination model for representing the illumination of a face. Unlike the existing parameterization of local illumination, our local spherical harmonic illumination model utilizes a smooth weight function for the seamless representation of natural illumination. Therefore, the albedo and shape information in an image can be precisely estimated using the first-order spherical harmonics. For accurate estimation of albedo, we also utilize facial albedo statistics to prevent the estimated albedo from becoming biased toward input image. Furthermore, we developed an accurate and reliable 3D shape reconstruction method from a normal map based on tetrahedron-based deformation. Comparing to the Laplacian deformation based method, our method is applicable to any mesh regardless of its structure. Through rigorous experiments, we demonstrate that the proposed local spherical harmonic illumination model is effective in estimating the complex illumination and can recover a highquality facial albedo and 3D shape.

show abstract

Robust Facial Pose Estimation Using Landmark Selection Method for Binocular Stereo Vision

Cited by 6 publications

References 20 publications

A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events

A Deep Learning Framework for Monitoring Audience Engagement in Online Video Events

Deep learning based six‐dimensional pose estimation in virtual reality

Local Spherical Harmonics for Facial Shape and Albedo Estimation

Contact Info

Product

Resources

About