Deep learning, especially in the form of convolutional neural networks (CNNs), has triggered substantial improvements in computer vision and related fields in recent years. This progress is attributed to the shift from designing features and subsequent individual subsystems towards learning features and recognition systems end to end from nearly unprocessed data. For speaker clustering, however, it is still common to use handcrafted processing chains such as MFCC features and GMM-based models. In this paper, we use simple spectrograms as input to a CNN and study the optimal design of those networks for speaker identification and clustering. Furthermore, we elaborate on the question how to transfer a network, trained for speaker identification, to speaker clustering. We demonstrate our approach on the well known TIMIT dataset, achieving results comparable with the state of the art-without the need for handcrafted features.
Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding, and although the clustering results are then on par with more traditional approaches using MFCC features etc., room for improvements stems from the fact that these embeddings are trained with a surrogate task that is rather far away from segregating unknown voices-namely, identifying few specific speakers. We address both problems by training a CNN to extract embeddings that are similar for equal speakers (regardless of their specific identity) using weakly labeled data. We demonstrate our approach on the well-known TIMIT dataset that has often been used for speaker clustering experiments in the past. We exceed the clustering performance of all previous approaches, but require just 100 instead of 590 unrelated speakers to learn an embedding suited for clustering.
Background Slow-paced breathing training (6 breaths per minute [BPM]) improves physiological and psychological well-being by inducing relaxation characterized by increased heart rate variability (HRV). However, classic breathing training has a limited target group, and retention rates are very low. Although a gameful approach may help overcome these challenges, it is crucial to enable breathing training in a scalable context (eg, smartphone only) and ensure that they remain effective. However, despite the health benefits, no validated mobile gameful breathing training featuring a biofeedback component based on breathing seems to exist. Objective This study aims to describe the design choices and their implementation in a concrete mobile gameful breathing training app. Furthermore, it aims to deliver an initial validation of the efficacy of the resulting app. Methods Previous work was used to derive informed design choices, which, in turn, were applied to build the gameful breathing training app Breeze. In a pretest (n=3), design weaknesses in Breeze were identified, and Breeze was adjusted accordingly. The app was then evaluated in a pilot study (n=16). To ascertain that the effectiveness was maintained, recordings of breathing rates and HRV-derived measures (eg, root mean square of the successive differences [RMSSDs]) were collected. We compared 3 stages: baseline, standard breathing training deployed on a smartphone, and Breeze. Results Overall, 5 design choices were made: use of cool colors, natural settings, tightly incorporated game elements, game mechanics reflecting physiological measures, and a light narrative and progression model. Breeze was effective, as it resulted in a slow-paced breathing rate of 6 BPM, which, in turn, resulted in significantly increased HRV measures compared with baseline (P<.001 for RMSSD). In general, the app was perceived positively by the participants. However, some criticized the somewhat weaker clarity of the breathing instructions when compared with a standard breathing training app. Conclusions The implemented breathing training app Breeze maintained its efficacy despite the use of game elements. Moreover, the app was positively perceived by participants although there was room for improvement.
Slow-paced biofeedback-guided breathing training has been shown to improve cardiac functioning and psychological wellbeing. Current training options, however, attract only a fraction of individuals and are limited in their scalability as they require dedicated biofeedback hardware. In this work, we present Breeze, a mobile application that uses a smartphone's microphone to continuously detect breathing phases, which then trigger a gamified biofeedback-guided breathing training. Circa 2.76 million breathing sounds from 43 subjects and control sounds were collected and labeled to train and test our breathing detection algorithm. We model breathing as inhalation-pause-exhalation-pause sequences and implement a phase-detection system with an attention-based LSTM model in conjunction with a CNN-based breath extraction module. A biofeedback-guided breathing training with Breeze takes place in real-time and achieves 75.5% accuracy in breathing phases detection. Breeze was also evaluated in a pilot study with 16 new subjects, which demonstrated that the majority of subjects prefer Breeze over a validated active control condition in its usefulness, enjoyment, control, and usage intentions. Breeze is also effective for strengthening users' cardiac functioning by increasing high-frequency heart rate variability. The results of our study suggest that Breeze could potentially be utilized in clinical and self-care activities. Sailboat In h a le P a u se E xh a le P a u se … Nose Mouth R e p e a te d a co u st ic b re a th in g se q u e n ce Fig. 1. Overview of Breeze, a mobile gamified biofeedback breathing training.
Background Slow-paced breathing training can have positive effects on physiological and psychological well-being. Unfortunately, use statistics indicate that adherence to breathing training apps is low. Recent work suggests that gameful breathing training may help overcome this challenge. Objective This study aimed to introduce and evaluate the gameful breathing training app Breeze 2 and its novel real-time breathing detection algorithm that enables the interactive components of the app. Methods We developed the breathing detection algorithm by using deep transfer learning to detect inhalation, exhalation, and nonbreathing sounds (including silence). An additional heuristic prolongs detected exhalations to stabilize the algorithm’s predictions. We evaluated Breeze 2 with 30 participants (women: n=14, 47%; age: mean 29.77, SD 7.33 years). Participants performed breathing training with Breeze 2 in 2 sessions with and without headphones. They answered questions regarding user engagement (User Engagement Scale Short Form [UES-SF]), perceived effectiveness (PE), perceived relaxation effectiveness, and perceived breathing detection accuracy. We used Wilcoxon signed-rank tests to compare the UES-SF, PE, and perceived relaxation effectiveness scores with neutral scores. Furthermore, we correlated perceived breathing detection accuracy with actual multi-class balanced accuracy to determine whether participants could perceive the actual breathing detection performance. We also conducted a repeated-measure ANOVA to investigate breathing detection differences in balanced accuracy with and without the heuristic and when classifying data captured from headphones and smartphone microphones. The analysis controlled for potential between-subject effects of the participants’ sex. Results Our results show scores that were significantly higher than neutral scores for the UES-SF (W=459; P<.001), PE (W=465; P<.001), and perceived relaxation effectiveness (W=358; P<.001). Perceived breathing detection accuracy correlated significantly with the actual multi-class balanced accuracy (r=0.51; P<.001). Furthermore, we found that the heuristic significantly improved the breathing detection balanced accuracy (F1,25=6.23; P=.02) and that detection performed better on data captured from smartphone microphones than than on data from headphones (F1,25=17.61; P<.001). We did not observe any significant between-subject effects of sex. Breathing detection without the heuristic reached a multi-class balanced accuracy of 74% on the collected audio recordings. Conclusions Most participants (28/30, 93%) perceived Breeze 2 as engaging and effective. Furthermore, breathing detection worked well for most participants, as indicated by the perceived detection accuracy and actual detection accuracy. In future work, we aim to use the collected breathing sounds to improve breathing detection with regard to its stability and performance. We also plan to use Breeze 2 as an intervention tool in various studies targeting the prevention and management of noncommunicable diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.