Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1405
|View full text |Cite
|
Sign up to set email alerts
|

Phonet: A Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech

Abstract: There are a lot of features that can be extracted from speech signals for different applications such as automatic speech recognition or speaker verification. However, for pathological speech processing there is a need to extract features about the presence of the disease or the state of the patients that are comprehensible for clinical experts. Phonological posteriors are a group of features that can be interpretable by the clinicians and at the same time carry suitable information about the patient's speech.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 27 publications
(28 citation statements)
references
References 15 publications
0
28
0
Order By: Relevance
“…AF space: There are different ways to represent phonemes as articulatory features such as binary features [26] or multivalued features [27]. In this work, we conducted studies with binary features and multi-valued AF representations: (a) AF binary : We used Phonet toolkit [28], which consists of 18 recurrent neural network-based binary AF classifiers trained on 17 hours of clean FM podcasts in Mexican Spanish. We extracted 18 AF binary probability vectors and used them as the posterior feature, i.e.…”
Section: B Posterior Feature Estimatorsmentioning
confidence: 99%
See 2 more Smart Citations
“…AF space: There are different ways to represent phonemes as articulatory features such as binary features [26] or multivalued features [27]. In this work, we conducted studies with binary features and multi-valued AF representations: (a) AF binary : We used Phonet toolkit [28], which consists of 18 recurrent neural network-based binary AF classifiers trained on 17 hours of clean FM podcasts in Mexican Spanish. We extracted 18 AF binary probability vectors and used them as the posterior feature, i.e.…”
Section: B Posterior Feature Estimatorsmentioning
confidence: 99%
“…In the Phonet toolkit 1 [28], these AFs are modeled by 18 off-the-shelf recurrent neural network (RNN) based binary classifiers, i.e. D = 18 × 2.…”
Section: A Phonetic Posterior Feature Representationsmentioning
confidence: 99%
See 1 more Smart Citation
“…A softmax activation function is used to compute the sequence of phoneme posterior probabilities = { 1 , 2 , … , T } . Bidirectional recurrent nets are used in this work because they have shown better results than standard GRUs in similar speech processing tasks [23,24].…”
Section: Recurrent Neural Network With Convolution Layersmentioning
confidence: 99%
“…The different phonemes of the Spanish language are grouped into 18 phonological posteriors. The phonological posteriors were computed with a bank of parallel recurrent neural networks to estimate the probability of occurrence of a specific phonological class [13].…”
Section: Speech Featuresmentioning
confidence: 99%