2022
DOI: 10.1016/j.jestch.2022.101148
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal voice pathology detection architecture based on deep and handcrafted feature fusion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…First, a two-stage approach is used in which expert-derived voice features, most commonly the MFCC, 6,12,15,16 are calculated from the raw voice data and used to predict vocal pathology. 28 Second, a single sustained vowel recording (e.g., selected vowel /a/ samples) is used as the initial raw data input. 15,16,29 The proposed framework offers several advantages.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…First, a two-stage approach is used in which expert-derived voice features, most commonly the MFCC, 6,12,15,16 are calculated from the raw voice data and used to predict vocal pathology. 28 Second, a single sustained vowel recording (e.g., selected vowel /a/ samples) is used as the initial raw data input. 15,16,29 The proposed framework offers several advantages.…”
Section: Discussionmentioning
confidence: 99%
“…First, a two‐stage approach is used in which expert‐derived voice features, most commonly the MFCC, 6 , 12 , 15 , 16 are calculated from the raw voice data and used to predict vocal pathology. 28 Second, a single sustained vowel recording (e.g., selected vowel /a/ samples) is used as the initial raw data input. 15 , 16 , 29 Even when the model is trained with different sustained vowel recordings (e.g., selected vowel /a/, /i/, or /u/ samples), only an individual vowel recording is input to generate a prediction of vocal pathology.…”
Section: Discussionmentioning
confidence: 99%
“…The study used 10-fold cross-validation and accuracies of 87.11% and 86.52% for CNN and RNN, respectively. A combination of features from EGG and speech from sustained vowel \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $/a/$\end{document} has been considered for distinguishing normal and pathological voices [35] , [36] . Other studies considered stacked autoencoder [37] and LSTM-based autoencoder [38] using sustained vowel \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} $/a/$\end{document} and continuous speech samples, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…The study used 10-fold crossvalidation and accuracies of 87.11% and 86.52% for CNN and RNN, respectively. A combination of features from EGG and speech from sustained vowel /a/ has been considered for distinguishing normal and pathological voices [35,36]. Other studies considered stacked autoencoder [37] and LSTM-based autoencoder [38] using sustained vowel /a/ and continuous speech samples, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…Since they do not rely on an individual's judgment, these assessment techniques are objective. Plus, they're simple to implement because several online recoding apps make it possible to access speech recordings from anywhere [7]. Hence, in order to reliably differentiate between healthy individuals and those with voice pathologies, several studies have devised vocal processing methods to ascertain which aspects of vocal pathology, when combined with a method, can effectively detect voice pathology automatically within a single framework [8].…”
Section: Introductionmentioning
confidence: 99%