This paper addresses the problem of automatic cry signal segmentation for the purposes of infant cry analysis. The main goal is to automatically detect expiratory and inspiratory phases from recorded cry signals. The approach used in this paper is made up of three stages: signal decomposition, features extraction, and classification. In the first stage, short-time Fourier transform, empirical mode decomposition (EMD), and wavelet packet transform have been considered. In the second stage, various set of features have been extracted, and in the third stage, two supervised learning methods, Gaussian mixture models and hidden Markov models, with four and five states, have been discussed as well. The main goal of this work is to investigate the EMD performance and to compare it with the other standard decomposition techniques. A combination of two and three intrinsic mode functions (IMFs) that resulted from EMD has been used to represent cry signal. The performance of nine different segmentation systems has been evaluated. The experiments for each system have been repeated several times with different training and testing datasets, randomly chosen using a 10-fold cross-validation procedure. The lowest global classification error rates of around 8.9% and 11.06% have been achieved using a Gaussian mixture models classifier and a hidden Markov models classifier, respectively. Among all IMF combinations, the winner combination is IMF3+IMF4+IMF5.
An analysis of newborn cry signals, either for the early diagnosis of neonatal health problems or to determine the category of a cry (e.g., pain, discomfort, birth cry, and fear), requires a primary and preliminary preprocessing step to quantify the important expiratory and inspiratory parts of the audio recordings of newborn cries. Data typically contain clean cries interspersed with sections of other sounds (generally, the sounds of speech, noise, or medical equipment) or silence. The purpose of signal segmentation is to differentiate the important acoustic parts of the cry recordings from the unimportant acoustic activities that compose the audio signals. This paper reports on our research to establish an automatic segmentation system for newborn cry recordings based on Hidden Markov Models using the HTK (Hidden Markov Model Toolkit). The system presented in this report is able to detect the two basic constituents of a cry, which are the audible expiratory and inspiratory parts, using a two-stage recognition architecture. The system is trained and tested on a real database collected from normal and pathological newborns. The experimental results indicate that the system yields accuracies of up to 83.79%. extracted from the cries and the health problems of the child [2][3][4][5]. Various studies are currently under way to devise a tool that analyzes cries automatically, to diagnose neonatal pathologies [6][7][8].We are involved in the design of an automatic system for early diagnosis, called the Newborn Cry-based Diagnostic System (NCDS), which can detect certain pathologies in newborns at an early stage. The implementation of this system requires a database containing hundreds of cry signals. The overwhelming problem that arises when working with such a database is the diversity of acoustic activities that compose the audio recordings, such as background noise, speech, the sound of medical equipment and silence. Such diversity could harm the analysis process, as the presence of any acoustic component other than the cry itself could result in the misclassification of pathologies by reducing the NCDS system performance. This is because the NCDS would decode every segment of the recording signal, whether it is part of a cry or not. In this case, unwanted segment insertion in essential crying segments would lengthen the process of classification unnecessarily and leave the system prone to error. An important subtask of the NCDS is the manipulation of the newborn cry sound, and what is needed to perform this subtask is a segmentation system. Until now, few works have been carried out in this area. In this paper, we propose an automatic segmentation module designed to isolate the audible expiration and inspiration parts of cry sounds to serve as a preprocessing step of our NCDS. The rest of this paper is organized as follows: Related work is presented in section 2. The HMM and the HTK are reviewed briefly in section 3. The training corpus and the testing corpus are described in section 4. In section 5, the architect...
The detection of cry sounds is generally an important pre-processing step for various applications involving cry analysis such as diagnostic systems, electronic monitoring systems, emotion detection, and robotics for baby caregivers. Given its complexity, an automatic cry segmentation system is a rather challenging topic. In this paper, a framework for automatic cry sound segmentation for application in a cry-based diagnostic system has been proposed. The contribution of various additional time- and frequency-domain features to increase the robustness of a Gaussian mixture model/hidden Markov model (GMM/HMM)-based cry segmentation system in noisy environments is studied. A fully automated segmentation algorithm to extract cry sound components, namely, audible expiration and inspiration, is introduced and is grounded on two approaches: statistical analysis based on GMMs or HMMs classifiers and a post-processing method based on intensity, zero crossing rate, and fundamental frequency feature extraction. The main focus of this paper is to extend the systems developed in previous works to include a post-processing stage with a set of corrective and enhancing tools to improve the classification performance. This full approach allows to precisely determine the start and end points of the expiratory and inspiratory components of a cry signal, EXP and INSV, respectively, in any given sound signal. Experimental results have indicated the effectiveness of the proposed solution. EXP and INSV detection rates of approximately 94.29% and 92.16%, respectively, were achieved by applying a tenfold cross-validation technique to avoid over-fitting.
Our deduction is that quantification of the variability of these parameters is useful for differentiating the cries of a healthy newborn from those of a newborn with a pathology, and that these data can be used for the early diagnosis of newborn diseases.
We make use of information inside infant’s cry signal in order to identify the infant’s psychological condition. Gaussian mixture models (GMMs) are applied to distinguish between healthy full-term and premature infants, and those with specific medical problems available in our cry database. Cry pattern for each pathological condition is created by using adapted boosting mixture learning (BML) method to estimate mixture model parameters. In the first experiment, test results demonstrate that the introduced adapted BML method for learning of GMMs has a better performance than conventional EM-based reestimation algorithm as a reference system in multipathological classification task. This newborn cry-based diagnostic system (NCDS) extracted Mel-frequency cepstral coefficients (MFCCs) as a feature vector for cry patterns of newborn infants. In binary classification experiment, the system discriminated a test infant’s cry signal into one of two groups, namely, healthy and pathological based on MFCCs. The binary classifier achieved a true positive rate of 80.77% and a true negative rate of 86.96% which show the ability of the system to correctly identify healthy and diseased infants, respectively.
Our challenge in the current study is to extend research on the cries of newborns for the early diagnosis of different pathologies. This paper proposes a recognition system for healthy and pathological cries using a probabilistic neural network classifier. Two different kinds of features have been used to characterize newborn cry signals: 1) acoustic features such as fundamental frequency glide (F 0glide) and resonance frequencies dysregulation (RFs dys); 2) conventional features such as mel-frequency cestrum coefficients. This paper describes the automatic estimation of the proposed characteristics and the performance evaluation of these features in identifying pathological cries. The adopted methods for F 0glides and RFs dys estimation are based on the derived function of the F 0 contour and the jump "J" of the RFs between two subsequent tunings, respectively. The database used contains 3250 cry samples of full-term and preterm newborns, and includes healthy and pathologic cries. The obtained results indicate the important association between the quantified features and some studied pathologies, and also an improvement in the identification of pathologic cries. The best result obtained is 88.71% for the correct identification of health status of preterm newborns, and 82% for the correct identification of full-term infants with a specific disease. We conclude that using the proposed characteristics improves the diagnosis of pathologies in newborns. Moreover, the method applied in the estimation of these characteristics allows us to extend this study to other uninvestigated pathologies.
Several hypotheses have been formulated as a result of observing spectrograms of the audio signals of the newborn infant cry in numerous studies. Our study is based on a few of these hypotheses. The purpose of this article is to differentiate pathological crying from healthy crying through acoustic cry analysis based on neurophysiological parameters of newborns. The automatic estimation of the characteristics of relevant cry signals, such as phonation, hyperphonation, and dysphonation, expressed as percentages, as well as unvoiced sound and mode change percentages, have enabled us to distinguish among the pathologies selected for this study. The results obtained have helped us to make quantitative associations between cry characteristics and pathological conditions affecting newborns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.