Neural Network Architecture That Combines Temporal and Summative Features for Infant Cry Classification in the Interspeech 2018 Computational Paralinguistics Challenge

Huckvale, Mark

doi:10.21437/interspeech.2018-1959

Cited by 8 publications

(8 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This result reflects recent developments in the DCASE community, in which virtually all competition systems contain convolutional stages [21], [22], [25], [26], [28], [43]- [46]. However the paralinguistics community still relies primarily on pure recurrent networks [8], [9], [18], [19].…”

Section: Discussionsupporting

confidence: 63%

“…Recurrent Stage: Gated recurrent units (GRU) and long short-term memory units (LSTM) [54] are the two most common recurrent types in paralinguistics [9], [18], [19], [55], [56]. Unidirectional [9], [18] as well as bidirectional [19], [56] networks are popular. We used the CuDNN implementations 3 to reduce the training time of the recurrent units.…”

Section: Hyperparameter Search Spacementioning

confidence: 99%

“…There are also Intermediate approaches between these two types, which typically calculate hand-crafted feature sets on audio frames (so called "low-level descriptors"(LLDs)) and input the resulting feature maps to RNNs [9]. Zhang et al [18] and Huckvale et al [19] reached 70.1 % UAR and 68.28 % UAR respectively through this approach. Wagner et al [9] directly dedicated their submission to the debate of end-toend vs conventional systems by comparing the performance of feature maps with varying complexity for RNNs.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Comparison of Artificial Neural Network Types for Infant Vocalization Classification

Anders

Hlawitschka

Fuchs

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Section: Discussionsupporting

confidence: 63%

Section: Hyperparameter Search Spacementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Comparison of Artificial Neural Network Types for Infant Vocalization Classification

Anders

Hlawitschka

Fuchs

2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

“…Our previous work [15] has shown that combining weighted prosodic features with MFCC features help improve the classification accuracy in a deep learning model. Other researchers have also found that F0 is critical in identifying infant cry signals [40]. Chittora and Patil used F0 to calculate unvoiced segments ratio and found out unvoiced percentage in a cry is an important parameter for analysis of infant cry [19].…”

Section: Fig 4 Multiple Order Mfcc Features Of Normal and Asphyxiatementioning

confidence: 99%

“…The most popular probabilistic classifier used in infant cry classification is Support Vector Machine (SVM) [26,40,43]. Many machine learning methods have been experimented in infant research.…”

Section: Infant Cry Classification Models 411 Traditional Machine Lmentioning

confidence: 99%

A review of infant cry analysis and classification

Mudiyanselage

Gao

et al. 2021

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

This paper reviews recent research works in infant cry signal analysis and classification tasks. A broad range of literatures are reviewed mainly from the aspects of data acquisition, cross domain signal processing techniques, and machine learning classification methods. We introduce pre-processing approaches and describe a diversity of features such as MFCC, spectrogram, and fundamental frequency, etc. Both acoustic features and prosodic features extracted from different domains can discriminate frame-based signals from one another and can be used to train machine learning classifiers. Together with traditional machine learning classifiers such as KNN, SVM, and GMM, newly developed neural network architectures such as CNN and RNN are applied in infant cry research. We present some significant experimental results on pathological cry identification, cry reason classification, and cry sound detection with some typical databases. This survey systematically studies the previous research in all relevant areas of infant cry and provides an insight on the current cutting-edge works in infant cry signal analysis and classification. We also propose future research directions in data processing, feature extraction, and neural network classification fields to better understand, interpret, and process infant cry signals.

show abstract