A Multistage Heterogeneous Stacking Ensemble Model for Augmented Infant Cry Classification

Joshi, Vinayak Ravi; Srinivasan, Kathiravan; Vincent, Pascal; Rajinikanth, V.; Chang, Chuan‐Yu

doi:10.3389/fpubh.2022.819865

Cited by 13 publications

(7 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Researchers conducted a study on classifying infant crying into four categories: hunger, pain, tiredness, and diaper (Joshi et al, 2022 ). They first preprocessed the signals and converted them into Mel-spectrograms.…”

Section: Related Workmentioning

confidence: 99%

Machine learning-based infant crying interpretation

Hammoud,

Getahun,

Baldycheva

et al. 2024

Front. Artif. Intell.

View full text Add to dashboard Cite

Crying is an inevitable character trait that occurs throughout the growth of infants, under conditions where the caregiver may have difficulty interpreting the underlying cause of the cry. Crying can be treated as an audio signal that carries a message about the infant's state, such as discomfort, hunger, and sickness. The primary infant caregiver requires traditional ways of understanding these feelings. Failing to understand them correctly can cause severe problems. Several methods attempt to solve this problem; however, proper audio feature representation and classifiers are necessary for better results. This study uses time-, frequency-, and time-frequency-domain feature representations to gain in-depth information from the data. The time-domain features include zero-crossing rate (ZCR) and root mean square (RMS), the frequency-domain feature includes the Mel-spectrogram, and the time-frequency-domain feature includes Mel-frequency cepstral coefficients (MFCCs). Moreover, time-series imaging algorithms are applied to transform 20 MFCC features into images using different algorithms: Gramian angular difference fields, Gramian angular summation fields, Markov transition fields, recurrence plots, and RGB GAF. Then, these features are provided to different machine learning classifiers, such as decision tree, random forest, K nearest neighbors, and bagging. The use of MFCCs, ZCR, and RMS as features achieved high performance, outperforming state of the art (SOTA). Optimal parameters are found via the grid search method using 10-fold cross-validation. Our MFCC-based random forest (RF) classifier approach achieved an accuracy of 96.39%, outperforming SOTA, the scalogram-based shuffleNet classifier, which had an accuracy of 95.17%.

show abstract

Section: Related Workmentioning

confidence: 99%

Machine learning-based infant crying interpretation

Hammoud,

Getahun,

Baldycheva

et al. 2024

Front. Artif. Intell.

View full text Add to dashboard Cite

show abstract

“…Most researchers have adopted the cepstral domain features in the feature extraction from audio signals such as Mel frequency cepstral coefficients (MFCC) [ 33 , 34 , 35 , 36 ], linear frequency cepstral coefficients (LFCC) [ 37 ], short-time cepstral coefficients (STCC) [ 37 ], and Bark frequency cepstral coefficients (BFCC) [ 38 ], combined with both DL and traditional ML models. MFCCs were the most used in identifying infant pathologies.…”

Section: Literature Reviewmentioning

confidence: 99%

“…A similar feature extraction was also used along with KNN in [ 35 ] and achieved an accuracy of 71.42% in determining the reason for crying, including hunger, belly pain, need for burping, discomfort, and tiredness. In [ 36 ], MFCC was used with the CNN model with multiple variants to test and multistage a heterogeneous stacking ensemble model, which consists of four levels of algorithms, Nu-support vector classification, random forest (RF), XGBoost, and AdaBoost. The classification results of the CNN model outperformed the other ML algorithms, reaching an accuracy of 93.7%.…”

Section: Literature Reviewmentioning

confidence: 99%

Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features

2023

View full text Add to dashboard Cite

Early diagnosis of medical conditions in infants is crucial for ensuring timely and effective treatment. However, infants are unable to verbalize their symptoms, making it difficult for healthcare professionals to accurately diagnose their conditions. Crying is often the only way for infants to communicate their needs and discomfort. In this paper, we propose a medical diagnostic system for interpreting infants’ cry audio signals (CAS) using a combination of different audio domain features and deep learning (DL) algorithms. The proposed system utilizes a dataset of labeled audio signals from infants with specific pathologies. The dataset includes two infant pathologies with high mortality rates, neonatal respiratory distress syndrome (RDS), sepsis, and crying. The system employed the harmonic ratio (HR) as a prosodic feature, the Gammatone frequency cepstral coefficients (GFCCs) as a cepstral feature, and image-based features through the spectrogram which are extracted using a convolution neural network (CNN) pretrained model and fused with the other features to benefit multiple domains in improving the classification rate and the accuracy of the model. The different combination of the fused features is then fed into multiple machine learning algorithms including random forest (RF), support vector machine (SVM), and deep neural network (DNN) models. The evaluation of the system using the accuracy, precision, recall, F1-score, confusion matrix, and receiver operating characteristic (ROC) curve, showed promising results for the early diagnosis of medical conditions in infants based on the crying signals only, where the system achieved the highest accuracy of 97.50% using the combination of the spectrogram, HR, and GFCC through the deep learning process. The finding demonstrated the importance of fusing different audio features, especially the spectrogram, through the learning process rather than a simple concatenation and the use of deep learning algorithms in extracting sparsely represented features that can be used later on in the classification problem, which improves the separation between different infants’ pathologies. The results outperformed the published benchmark paper by improving the classification problem to be multiclassification (RDS, sepsis, and healthy), investigating a new type of feature, which is the spectrogram, using a new feature fusion technique, which is fusion, through the learning process using the deep learning model.

show abstract

“…Ting et al 21 classified asphyxia infant cry using hybrid speech features and CNN. Joshi et al 22 proposed a multistage heterogeneous ensemble model for augmented infant cry classification. Initially, the mel‐frequency cepstral coefficients algorithm was used to generate the spectrograms and to analyze the varying feature vectors.…”

Section: Literature Reviewmentioning

confidence: 99%

CNN‐SCNet: A CNN net‐based deep learning framework for infant cry detection in household setting

Jahangir

2023

Engineering Reports

View full text Add to dashboard Cite

Infants are vulnerable to several health problems and cannot express their needs clearly. Whenever they are in a state of urgency and require immediate attention, they cry, which is a form of communication for them. Therefore, the parents of the infants always need to be alert and keep continuous supervision of their infants. However, parents cannot monitor their infants all the time. An infant monitoring system could be a possible solution to monitor the infants, determine when the infants are crying, and notify the parents immediately. Although many such systems are available, most cannot detect infant cries. Some systems have infant cry detection mechanisms, but those mechanisms are not very accurate in detecting infant cries because the mechanisms either include obsolete approaches or machine learning (ML) models that cannot identify infant cries from noisy household settings. To address this limitation, in this research, different conventional and hybrid ML models were developed and analyzed in detail to find out the best model for detecting infant cries in a household setting. A stacked classifier is proposed using different state‐of‐the‐art technologies, outperforming all other developed models. The proposed CNN‐SCNet's (CNN‐Stacked Classifier Network) precision, recall, and f1‐score were found to be 98.72%, 98.05%, and 98.39%, respectively. Infant monitoring systems can use this classifier to detect infant cries in noisy household settings.

show abstract

A Multistage Heterogeneous Stacking Ensemble Model for Augmented Infant Cry Classification

Cited by 13 publications

References 31 publications

Machine learning-based infant crying interpretation

Machine learning-based infant crying interpretation

Infant Cry Signal Diagnostic System Using Deep Learning and Fused Features

CNN‐SCNet: A CNN net‐based deep learning framework for infant cry detection in household setting

Contact Info

Product

Resources

About