2018 IEEE Spoken Language Technology Workshop (SLT) 2018
DOI: 10.1109/slt.2018.8639648
|View full text |Cite
|
Sign up to set email alerts
|

Rapid Speaker Adaptation of Neural Network Based Filterbank Layer for Automatic Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 18 publications
0
6
0
Order By: Relevance
“…Sailor and Patil [22] indeed showed that their proposed convolutional restricted Boltzmann machine (RBM) model learns different centre frequencies depending on the task at hand. Our work is perhaps most closely related to Seki et al [21], who proposed to adapt a filterbank composed of differentiable functions such as Gaussian or Gammatone filters. They demonstrated more than 7% relative reductions in WER when adapting to speakers in a spontaneous Japanese speech transcription task.…”
Section: Introductionmentioning
confidence: 92%
See 2 more Smart Citations
“…Sailor and Patil [22] indeed showed that their proposed convolutional restricted Boltzmann machine (RBM) model learns different centre frequencies depending on the task at hand. Our work is perhaps most closely related to Seki et al [21], who proposed to adapt a filterbank composed of differentiable functions such as Gaussian or Gammatone filters. They demonstrated more than 7% relative reductions in WER when adapting to speakers in a spontaneous Japanese speech transcription task.…”
Section: Introductionmentioning
confidence: 92%
“…However, the filter gains may be suitable targets for adaptation for which we would like to attribute importance to the output of individual filters with a small number of parameters. This has similarly been done with learnable filterbanks in traditional feature extraction pipelines [21]. We also briefly note that if we were to scale the gain of each filter, then this would correspond to a version of feature-space Maximum Likelihood Linear Regression (fMLLR) [3] with a diagonal matrix and no bias, or similarly to Learning Hidden Unit Contritutions (LHUC) [2] which scales the output of each neuron by a scalar r (i) for filter i:…”
Section: Vtln Typically Uses a Scaling Function That Is Assumedmentioning
confidence: 99%
See 1 more Smart Citation
“…Then feature reduction/selection is one of the methods to solve this issue and many types of feature reduction employed in ASR. SVD is a popular and well-known method that was applied to test the performance of the recognition [15][16][17]. Therefore, SVD was employed to reduce number of features in this study.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The recognition performance of an Automatic Speech Recognition (ASR) system is affected by speaker variations. Speaker adaptation in conventional DNN-HMM based systems was explored in [1,2,3,4,5,6]. i-vectors appended to input features have been shown to improve the model performance.…”
Section: Introductionmentioning
confidence: 99%