Abstract. This paper deals with automatic optimization of free decoding parameters. We propose using a Simplified Simultaneous Perturbation Stochastic Approximation algorithm to optimize these parameters. This method provides a significant reduction in computational and labor costs. We also demonstrate that the proposed method successfully copes with the optimization of parameters for a specific target real-time factor, for all the databases we tested.Keywords: Simplified Simultaneous Perturbation Stochastic Approximation, SPSA, decoding parameter, real-time factor, RTF, speech recognition.
IntroductionThe balance of accuracy and speed of automatic speech recognition depends on the solution of a number of related tasks, such as:─ optimization of the acoustic model; ─ optimization of the language model; ─ optimization of a large set of free decoding parameters.Optimization of both the acoustic model and the language model in automatic speech recognition for large vocabularies is a well-known task [1]. In contrast, the problem of optimizing free decoding parameters is still often solved manually or by using grid search (i.e. searching for values in a grid with a specified step). The task is complicated by the fact that each parameter can have a different impact on the accuracy of speech recognition and/or the expected decoding time. Moreover, each new domain requires searching for new optimal decoding parameters every time we change the training data. Lastly, changing hardware configuration also requires adjustment of optimal decoding parameters.
Simplified Simultaneous Perturbation Stochastic Approximation for the Optimization 403Typically, the search for optimal decoding parameters that satisfy the constraints of the real-time factor and at the same time provide high recognition accuracy is a very time-consuming task.In this paper, we present a Simplified Simultaneous Perturbation Stochastic Approximation for optimizing free decoding parameters. The proposed method significantly reduces computational costs in compared to [2], and the reduction is even greater compared to grid search. In contrast to [3] and [4], Simplified SPSA takes into account the real-time factor, which is of vital importance for the design of an ASR system. The proposed method also requires lower computational costs than [1] and [2] for finding the optimal accuracy corresponding to a specific real-time factor. We introduce a penalty function, which is used to achieve a balance between recognition accuracy and decoding time. Then we demonstrate that this method provides robust and fast results. We present results obtained on three speech databases comprising spontaneous and read speech.
Simultaneous Perturbation Stochastic Approximation (SPSA)Let us start by describing the standard form of the SPSA algorithm [5]. We denote the vector of free decoding parameters as . Let denote the estimate for at the th iteration. Then the algorithm has the standard form:where · is an estimate for the gradient at the th iteration. The gain sequence satisfies c...