2019
DOI: 10.1109/taslp.2018.2871755
|View full text |Cite
|
Sign up to set email alerts
|

Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 25 publications
0
8
0
Order By: Relevance
“…Data Types: Evolutionary DNN construction approaches have been applied to various data types, such as images [13], [59], [108], [109], speech [128], [133], [148], and texts [15], [110]. In particular, tremendous research effort has been devoted to solving the image classification problem.…”
Section: A Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Data Types: Evolutionary DNN construction approaches have been applied to various data types, such as images [13], [59], [108], [109], speech [128], [133], [148], and texts [15], [110]. In particular, tremendous research effort has been devoted to solving the image classification problem.…”
Section: A Applicationsmentioning
confidence: 99%
“…On CIFAR-100 and ImageNet, the DNN models constructed by EAs have achieved competitive performance in comparison to handcrafted DNN models [13], [15]. Besides image classification, the DNN models designed by EAs have demonstrated great successes in object identification [17], [185], speech recognition [128], and emotion recognition [40].…”
Section: A Applicationsmentioning
confidence: 99%
“…The LAS model is a Seq2Seq model based on the attention mechanism. Its goal is to maximize the conditional probability of the output character sequence under the given conditions for speech inputs [28]- [29]. The model is trained directly with the input speech feature sequence and its corresponding character sequence.…”
Section: A Listen Attend and Spell Modelmentioning
confidence: 99%
“…1) This paper presents the first use of DARTS based NAS techniques to automatically learn architecture hyperparameters that directly affect the performance and model complexity of state-of-the-art LF-MMI trained TDNN-F acoustic models. In contrast, previous NAS researches conducted on similar systems either used a) evolutionary algorithms requiring expert setting of initial genes and long evaluation time for each individual candidate architecture [35] (up to 4 days even with a manual early-stopping mechanism) while in our NAS approaches the entire architecture search is performed over all possible 7 28 candidate systems and model training cycle is limited to approximately 6.6 GPU days; or b) architecture sampling based straight-through gradient approach [39] on a TDNN-CTC end-to-end system producing much higher WERs (12.6% and 23.2%) on the swbd and callhm subsets of Hub5' 00 test set than our NAS auto-configured TDNN-F systems on the same data (6.9% and 13.0%) presented in this paper. 2) To facilitate efficient search over a very large number of TDNN-F systems, this paper presents the first use of a flexible model parameter sharing scheme that is tailor-designed for specific hyper-parameters contained in TDNN-F systems, to the best of our knowledge.…”
Section: Introductionmentioning
confidence: 99%
“…4) This paper further presents the earliest work on analysing the efficacy of NAS approaches when being used to minimize the structural redundancy in the TDNN-F systems and reduce their model parameter uncertainty when given limited training data. In contrast, only speech recognition accuracy performance and model size reduction are investigated in previous researches [35]- [42], [70]. The rest of this paper is organized as follows.…”
Section: Introductionmentioning
confidence: 99%