ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682324
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Detect Dysarthria from Raw Speech

Abstract: Speech classifiers of paralinguistic traits traditionally learn from diverse hand-crafted low-level features, by selecting the relevant information for the task at hand. We explore an alternative to this selection, by learning jointly the classifier, and the feature extraction. Recent work on speech recognition has shown improved performance over speech features by learning from the waveform. We extend this approach to paralinguistic classification and propose a neural network that can learn a filterbank, a no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(17 citation statements)
references
References 24 publications
0
17
0
Order By: Relevance
“…For UA-Speech and TORGO, the database is split in order to maintain a good partition of speakers with different severities or intelligibility scores between the training, validation, and test sets, without having any overlap in speakers between the different sets. A similar method of data splitting has been used, for example, in [38]. Tables 3 and 4 show the data partition in UA-Speech and TORGO, respectively.…”
Section: B Experimental Setupmentioning
confidence: 99%
See 2 more Smart Citations
“…For UA-Speech and TORGO, the database is split in order to maintain a good partition of speakers with different severities or intelligibility scores between the training, validation, and test sets, without having any overlap in speakers between the different sets. A similar method of data splitting has been used, for example, in [38]. Tables 3 and 4 show the data partition in UA-Speech and TORGO, respectively.…”
Section: B Experimental Setupmentioning
confidence: 99%
“…In order to develop deep learning models, existing studies have mainly used combinations of convolutional neural network (CNN) and multilayer perceptron (MLP) [31], [33]- [37]. In addition, some studies have explored combining CNN and long short-term memory (LSTM) networks [32], and combining LSTM and MLP [38] for detection of pathological voice from healthy speech. Even though different deep learning architectures have been studied in the recent pathological voice detection studies listed above, a systematic comparison between latest end-to-end methods and systems based on the traditional pipeline is still lacking in the study area.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, certain degenerative illnesses, such as amyotrophic lateral sclerosis (ALS), often cause damages to various areas of the nervous system [27] [28]. Some researchers also indicate that the disease can also have influences on other areas, such as early childhood recognition, sensory defect, and intellectual disability [29] [30][31] [32]. However, depends on the severity levels of the damage to different areas, we can still define the leading root cause of the disease, and based on that we can group dysarthria into different types.…”
Section: Dysarthriamentioning
confidence: 99%
“…Detecting dysarthria involves extracting hand-crafted acoustic features and using those features as inputs to a machine learning-based classifier [18][19][20]. Deep learning approaches are also possible where the raw speech signal or a set of elementary features are fed into complex neural network architectures that automatically determine the important acoustic information and distinguish between healthy and dysarthric speech [21,22]. Deep learning approaches require less data preparation and feature engineering but may suffer from a lack of interpretability as further post-processing is often required to interpret how the speaker's speech is impaired.…”
Section: Introductionmentioning
confidence: 99%