Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2462
|View full text |Cite
|
Sign up to set email alerts
|

Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features

Abstract: Neural networks have a reputation for being "black boxes", which it has been suggested that techniques from user interface development, and visualisation in particular, could help lift. In this paper, we explore 9-dimensional bottleneck features (BNFs) that have been shown in our earlier work to well represent speech in the context of speech recognition, and 2-dimensional BNFs directly extracted from bottleneck neural networks. The 9-dimensional BNFs obtained from a phone classification neural network are visu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…There have been many explorations toward interpreting neural network outputs in the speech processing area. Bai et al [20] proposed to use linear discriminant analysis (LDA) and t-distributed stochastic neighbor embedding (t-SNE) to analyze 9-dimensional bottleneck features (BNFs). Karita et al [21] use t-SNE to visually show how features are mixed or split with inter-domain loss.…”
Section: Related Workmentioning
confidence: 99%
“…There have been many explorations toward interpreting neural network outputs in the speech processing area. Bai et al [20] proposed to use linear discriminant analysis (LDA) and t-distributed stochastic neighbor embedding (t-SNE) to analyze 9-dimensional bottleneck features (BNFs). Karita et al [21] use t-SNE to visually show how features are mixed or split with inter-domain loss.…”
Section: Related Workmentioning
confidence: 99%
“…Previous work on exploring how neural networks represent different aspects of speech has focused primarily on investigating the learning of phonetic representations [19,20,21]. This typically involves two general approaches: (i) an unsupervised one, where a neural network autoencoder is used in order to investigate whether the network is capable of learning phonemelike representations without explicit labels [22], (ii) a supervised one, where the neural network is trained with phone labels for the task of phoneme recognition [20,19]. In both cases, investigations are focused on analyzing the representational properties of the complex features learned at different layers of the network and also at different nodes.…”
Section: Introductionmentioning
confidence: 99%