Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-124
|View full text |Cite
|
Sign up to set email alerts
|

Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production

Abstract: Low-dimensional 'bottleneck' features extracted from neural networks have been shown to give phoneme recognition accuracy similar to that obtained with higher-dimensional MFCCs, using GMM-HMM models. Such features have also been shown to preserve well the assumptions of speech trajectory dynamics made by dynamic models of speech such as Continuous-State HMMs. However, little is understood about how networks derive these features and how and whether they can be interpreted in terms of human speech perception an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
4

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 32 publications
0
6
0
Order By: Relevance
“…This paper extends our previous study of very lowdimensional BNFs, including phone classification [1] and visualization and interpretation [2]. Our objective is to determine whether it is advantageous for phone-classification of feature vectors to treat the acoustic space A as a non-linear manifold, in which several BPC-dependent DNNs rather than a single DNN are used for phone classification.…”
Section: Introductionmentioning
confidence: 89%
See 1 more Smart Citation
“…This paper extends our previous study of very lowdimensional BNFs, including phone classification [1] and visualization and interpretation [2]. Our objective is to determine whether it is advantageous for phone-classification of feature vectors to treat the acoustic space A as a non-linear manifold, in which several BPC-dependent DNNs rather than a single DNN are used for phone classification.…”
Section: Introductionmentioning
confidence: 89%
“…Deep neural networks (DNNs) trained with phone posterior probability targets can be used to create very low-dimensional discriminative representations of speech, called bottleneck features (BNFs). In automatic speech recognition (ASR) experiments, BNFs with as few as 9 dimensions perform as well as 39 dimensional features based on conventional mel frequency cepstral coefficients (MFCCs) [1], have an intuitive dynamical structure, and can be interpreted in terms of human perception and production [2].…”
Section: Introductionmentioning
confidence: 99%
“…In addition to their use as feature vectors for speech recog-nition, BNFs are of interest because of their utility for visualization of speech signals [22]. This is investigated in Section 4.…”
Section: Dnn-hmmmentioning
confidence: 99%
“…It was shown in [22] that low dimensional projections of BNFs can represent speech sounds in a topology that broadly reflects their phonetic properties, and that variations due to different initializations of the DNN may be compensated by suitable linear transformations. Motivated by this we hypothesized that a BNF induced image might offer an insight on how speech sounds evolve as a function of age.…”
Section: Visualizationmentioning
confidence: 99%
“…Specifically, 9-dimensional (9D) BNFs extracted from a phone discrimination bottleneck neural network provided better ASR phone accuracies than 39-dimensional Mel-frequency cepstral coefficients (MFCCs) in conventional GMM-HMM ASR systems. In [15], we report visualisations and interpretations of 3-dimensional (3D) BNFs and argue that the bottleneck neural networks derive representations specific to particular phonetic categories, with properties similar to those used by human perception. In this paper we extend this research and try to explore how these bottleneck neural networks learn phonetic information.…”
Section: Introductionmentioning
confidence: 99%