2020
DOI: 10.1109/access.2020.2995737
|View full text |Cite
|
Sign up to set email alerts
|

Detection of Speech Impairments Using Cepstrum, Auditory Spectrogram and Wavelet Time Scattering Domain Features

Abstract: We adopt Bidirectional Long Short-Term Memory (BiLSTM) neural network and Wavelet Scattering Transform with Support Vector Machine (WST-SVM) classifier for detecting speech impairments of patients at the early stage of central nervous system disorders (CNSD). The study includes 339 voice samples collected from 15 subjects: 7 patients with early stage CNSD (3 Huntington, 1 Parkinson, 1 cerebral palsy, 1 post stroke, 1 early dementia), other 8 subjects were healthy. Speech data is collected using voice recorder … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
24
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 41 publications
(24 citation statements)
references
References 57 publications
(69 reference statements)
0
24
0
Order By: Relevance
“…To feed it to the 2D convolution layer of CRNN Himid et al [14] suggest either to convert it into the spectrogram and feed the L L ∈ convolutional layer of CRNN with organized feature maps i.e. with a context window of F log Mel band energies over T frames [21]. In the presented work, the second method is preferred to feed the proposed model.…”
Section: Data Inputmentioning
confidence: 99%
See 1 more Smart Citation
“…To feed it to the 2D convolution layer of CRNN Himid et al [14] suggest either to convert it into the spectrogram and feed the L L ∈ convolutional layer of CRNN with organized feature maps i.e. with a context window of F log Mel band energies over T frames [21]. In the presented work, the second method is preferred to feed the proposed model.…”
Section: Data Inputmentioning
confidence: 99%
“…convolutional layer of CRNN with organized feature maps i.e. with a context window of F log Mel band energies over T frames [21]. In the presented work, the second method is preferred to feed the proposed model.…”
Section: Data Inputmentioning
confidence: 99%
“…The purpose of E3 was to classify voice recordings (64 kbps audio files in mp3 format), taken from the T14 task, into the impaired and healthy classes, thus building a model to predict suspected speech impairments for a subject. To eliminate silence segments that did not contain useful information on the health condition of the speaking person, the isolation of speech segments using the thresholding method was applied, which is described in more detail in Lauraitis et al [71].…”
Section: E3: Speech Impairment Detection Using Bilstmmentioning
confidence: 99%
“…1. A typical wave form variance of a healthy person and an individual suffering from speech impairment (data taken from the dataset described in [20,21])…”
Section: Literature Reviewmentioning
confidence: 99%
“…Previous study on early diagnosis of PD include [19], which presented an ensemble classifier based on Deep Belief Network (DBN) and Self-Organizing Map (SOM) for remote tracking of PD progress. Recent studies [20,21] proposed a hybrid model based on bidirectional LSTM (Bi-LSTM) neural network and wavelet scattering transform (WST) and SVM classifier to detect speech impairments. Authors experimented on 15 subjects and 7 diseased subjects making up for 339 voice samples.…”
Section: A Related Studies On Speech Impairmentmentioning
confidence: 99%