With the rapid growth of wearable computing and increasing demand for mobile authentication scenarios, voiceprint-based authentication has become one of the prevalent technologies and has already presented tremendous potentials to the public. However, it is vulnerable to voice spoofing attacks (e.g., replay attacks and synthetic voice attacks). To address this threat, we propose a new biometric authentication approach, named EarPrint, which aims to extend voiceprint and build a hidden and secure user authentication scheme on earphones. EarPrint builds on the speaking-induced body sound transmission from the throat to the ear canal, i.e., different users will have different body sound conduction patterns on both sides of ears. As the first exploratory study, extensive experiments on 23 subjects show the EarPrint is robust against ambient noises and body motions. EarPrint achieves an Equal Error Rate (EER) of 3.64% with 75 seconds enrollment data. We also evaluate the resilience of EarPrint against replay attacks. A major contribution of EarPrint is that it leverages two-level uniqueness, including the body sound conduction from the throat to the ear canal and the body asymmetry between the left and the right ears, taking advantage of earphones' paring form-factor. Compared with other mobile and wearable biometric modalities, EarPrint is a low-cost, accurate, and secure authentication solution for earphone users.
With the rapid growth of artificial intelligence and mobile computing, intelligent speech interface has recently become one of the prevalent trends and has already presented huge potentials to the public. To address the privacy leakage issue during the speech interaction or accommodate some special demands, silent speech interfaces have been proposed to enable people's communication without vocalizing their sound (e.g., lip reading, tongue tracking). However, most existing silent speech mechanisms require either background illuminations or additional wearable devices. In this study, we propose the EchoWhisper as a novel user-friendly, smartphone-based silent speech interface. The proposed technique takes advantage of the micro-Doppler effect of the acoustic wave resulting from mouth and tongue movements and assesses the acoustic features of beamformed reflected echoes captured by the dual microphones in the smartphone. Using human subjects who perform a daily conversation task with over 45 different words, our system can achieve a WER (word error rate) of 8.33%, which shows the effectiveness of inferring silent speech content. Moreover, EchoWhisper has also demonstrated its reliability and robustness to a variety of configuration settings and environmental factors, such as smartphone orientations and distances, ambient noises, body motions, and so on.
Intelligent speech interfaces have been developing vastly to support the growing demands for convenient control and interaction with wearable/earable and portable devices. To avoid privacy leakage during speech interactions and strengthen the resistance to ambient noise, silent speech interfaces have been widely explored to enable people's interaction with mobile/wearable devices without audible sounds. However, most existing silent speech solutions require either restricted background illuminations or hand involvement to hold device or perform gestures. In this study, we propose a novel earphone-based, hand-free silent speech interaction approach, named EarCommand. Our technique discovers the relationship between the deformation of the ear canal and the movements of the articulator and takes advantage of this link to recognize different silent speech commands. Our system can achieve a WER (word error rate) of 10.02% for word-level recognition and 12.33% for sentence-level recognition, when tested in human subjects with 32 word-level commands and 25 sentence-level commands, which indicates the effectiveness of inferring silent speech commands. Moreover, EarCommand shows high reliability and robustness in a variety of configuration settings and environmental conditions. It is anticipated that EarCommand can serve as an efficient, intelligent speech interface for hand-free operation, which could significantly improve the quality and convenience of interactions.
We propose SonicASL, a real-time gesture recognition system that can recognize sign language gestures on the fly, leveraging front-facing microphones and speakers added to commodity earphones worn by someone facing the person making the gestures. In a user study (N=8), we evaluate the recognition performance of various sign language gestures at both the word and sentence levels. Given 42 frequently used individual words and 30 meaningful sentences, SonicASL can achieve an accuracy of 93.8% and 90.6% for word-level and sentence-level recognition, respectively. The proposed system is tested in two real-world scenarios: indoor (apartment, office, and corridor) and outdoor (sidewalk) environments with pedestrians walking nearby. The results show that our system can provide users with an effective gesture recognition tool with high reliability against environmental factors such as ambient noises and nearby pedestrians.
Recognition of facial expressions has been widely explored to represent people's emotional states. Existing facial expression recognition systems primarily rely on external cameras which make it less accessible and efficient in many real-life scenarios to monitor an individual's facial expression in a convenient and unobtrusive manner. To this end, we propose PPGface, a ubiquitous, easy-to-use, user-friendly facial expression recognition platform that leverages earable devices with built-in PPG sensor. PPGface understands the facial expressions through the dynamic PPG patterns resulting from facial muscle movements. With the aid of the accelerometer sensor, PPGface can detect and recognize the user's seven universal facial expressions and relevant body posture unobtrusively. We conducted an user study (N=20) using multimodal ResNet to evaluate the performance of PPGface, and showed that PPGface can detect different facial expressions with 93.5 accuracy and 0.93 fl-score. In addition, to explore the robustness and usability of our proposed platform, we conducted several comprehensive experiments under real-world settings. Overall results of this work validate a great potential to be employed in future commodity earable devices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.