Detection of Speech Impairments Using Cepstrum, Auditory Spectrogram and Wavelet Time Scattering Domain Features

Lauraitis, Andrius; Maskeliūnas, Rytis; Damaševičius, Robertas; Krilavičius, Tomas

doi:10.1109/access.2020.2995737

Cited by 41 publications

(24 citation statements)

References 57 publications

(69 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To feed it to the 2D convolution layer of CRNN Himid et al [14] suggest either to convert it into the spectrogram and feed the L L ∈ convolutional layer of CRNN with organized feature maps i.e. with a context window of F log Mel band energies over T frames [21]. In the presented work, the second method is preferred to feed the proposed model.…”

Section: Data Inputmentioning

confidence: 99%

See 1 more Smart Citation

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Kinkar¹,

Jain²

2021

ITC

View full text Add to dashboard Cite

The presented paper proposes a new speech command recognition model for novel engineering applications with limited resources. We built the proposed model with the help of a Convolutional Recurrent Neural Network (CRNN). The use of CRNN instead of Convolutional Neural Network (CNN) helps us to reduce the model parameters and memory requirement as per resource constraints. Furthermore, we insert transmute and curtailment layer between the layers of CRNN. By doing this we further reduce model parameters and float number of operations to half of the CRNN requirement. The proposed model is tested on Google’s speech command dataset. The obtained result shows that the proposed CRNN model requires 1/3 parameters as compared to the CNN model. The number of parameters of the CRNN model is further reduced by 45% and the float numbers of operations between 2% to 12 % in different recognition tasks. The recognition accuracy of the proposed model is 96% on Google’s speech command dataset, and on laboratory recording, its recognition accuracy is 89%.

show abstract

Section: Data Inputmentioning

confidence: 99%

“…convolutional layer of CRNN with organized feature maps i.e. with a context window of F log Mel band energies over T frames [21]. In the presented work, the second method is preferred to feed the proposed model.…”

Section: Data Inputmentioning

confidence: 99%

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Kinkar¹,

Jain²

2021

ITC

View full text Add to dashboard Cite

show abstract

“…The purpose of E3 was to classify voice recordings (64 kbps audio files in mp3 format), taken from the T14 task, into the impaired and healthy classes, thus building a model to predict suspected speech impairments for a subject. To eliminate silence segments that did not contain useful information on the health condition of the speaking person, the isolation of speech segments using the thresholding method was applied, which is described in more detail in Lauraitis et al [71].…”

Section: E3: Speech Impairment Detection Using Bilstmmentioning

confidence: 99%

A Mobile Application for Smart Computer-Aided Self-Administered Testing of Cognition, Speech, and Motor Impairment

Lauraitis

Maskeliūnas

Damaševičius

et al. 2020

Sensors

Self Cite

View full text Add to dashboard Cite

We present a model for digital neural impairment screening and self-assessment, which can evaluate cognitive and motor deficits for patients with symptoms of central nervous system (CNS) disorders, such as mild cognitive impairment (MCI), Parkinson’s disease (PD), Huntington’s disease (HD), or dementia. The data was collected with an Android mobile application that can track cognitive, hand tremor, energy expenditure, and speech features of subjects. We extracted 238 features as the model inputs using 16 tasks, 12 of them were based on a self-administered cognitive testing (SAGE) methodology and others used finger tapping and voice features acquired from the sensors of a smart mobile device (smartphone or tablet). Fifteen subjects were involved in the investigation: 7 patients with neurological disorders (1 with Parkinson’s disease, 3 with Huntington’s disease, 1 with early dementia, 1 with cerebral palsy, 1 post-stroke) and 8 healthy subjects. The finger tapping, SAGE, energy expenditure, and speech analysis features were used for neural impairment evaluations. The best results were achieved using a fusion of 13 classifiers for combined finger tapping and SAGE features (96.12% accuracy), and using bidirectional long short-term memory (BiLSTM) (94.29% accuracy) for speech analysis features.

show abstract

“…1. A typical wave form variance of a healthy person and an individual suffering from speech impairment (data taken from the dataset described in [20,21])…”

Section: Literature Reviewmentioning

confidence: 99%

“…Previous study on early diagnosis of PD include [19], which presented an ensemble classifier based on Deep Belief Network (DBN) and Self-Organizing Map (SOM) for remote tracking of PD progress. Recent studies [20,21] proposed a hybrid model based on bidirectional LSTM (Bi-LSTM) neural network and wavelet scattering transform (WST) and SVM classifier to detect speech impairments. Authors experimented on 15 subjects and 7 diseased subjects making up for 339 voice samples.…”

Section: A Related Studies On Speech Impairmentmentioning

confidence: 99%

BiLSTM with Data Augmentation using Interpolation Methods to Improve Early Detection of Parkinson Disease

Abayomi‐Alli

Damaševičius

Maskeliūnas

et al. 2020

Proceedings of the 2020 Federated Conference on Computer Science and Information Systems

View full text Add to dashboard Cite

The lack of dopamine in the human brain is the cause of Parkinson disease (PD) which is a degenerative disorder common globally to older citizens. However, late detection of this disease before the first clinical diagnosis has led to increased mortality rate. Research effort towards the early detection of PD has encountered challenges such as: small dataset size, class imbalance, overfitting, high false detection rate, model complexity, etc. This paper aims to improve early detection of PD using machine learning through data augmentation for very small datasets. We propose using Spline interpolation and Piecewise Cubic Hermite Interpolating Polynomial (Pchip) interpolation methods to generate synthetic data instances. We further investigate on reducing dimensionality of features for effective and real-time classification while considering computational complexity of implementation on real-life mobile phones. For classification we use Bidirectional LSTM (BiLSTM) deep learning network and compare the results with traditional machine learning algorithms like Support Vector Machine (SVM), Decision Tree, Logistic regression, KNN and Ensemble bagged tree. For experimental validation we use the Oxford Parkinson disease dataset with 195 data samples, which we have augmented with 571 synthetic data samples. The results for BiLSTM shows that even with a holdout of 90%, the model was still able to effectively recognize PD with an average accuracy for ten rounds experiment using 22 features as 82.86%, 97.1%, and 96.37% for original, augmented (Spline) and augmented (Pchip) datasets, respectively. Our results show that proposed data augmentation schemes have significantly (p < 0.001) improved the accuracy of PD recognition on a small dataset using both classical machine learning models and BiLSTM.

show abstract

Detection of Speech Impairments Using Cepstrum, Auditory Spectrogram and Wavelet Time Scattering Domain Features

Cited by 41 publications

References 57 publications

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

Low Latency Based Convolutional Recurrent Neural Network Model for Speech Command Recognition

A Mobile Application for Smart Computer-Aided Self-Administered Testing of Cognition, Speech, and Motor Impairment

BiLSTM with Data Augmentation using Interpolation Methods to Improve Early Detection of Parkinson Disease

Contact Info

Product

Resources

About