ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413630
|View full text |Cite
|
Sign up to set email alerts
|

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks

Abstract: Deep neural networks (DNNs) based automatic speech recognition (ASR) systems are often designed using expert knowledge and empirical evaluation. In this paper, a range of neural architecture search (NAS) techniques are used to automatically learn two types of hyperparameters of state-of-the-art factored time delay neural networks (TDNNs): i) the left and right splicing context offsets; and ii) the dimensionality of the bottleneck linear projection at each hidden layer. These include the DARTS method integratin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(10 citation statements)
references
References 25 publications
1
9
0
Order By: Relevance
“…The random search, which selects the best performed architecture from 5 randomly sampled architectures, turns out to be a considerably strong baseline that achieves comparable performance with the system found by joint optimization. This coincides with the observations in previous literature [12,17]. We need to also note that the baseline architectures are already optimized by the authors, which also provides a sufficiently good starting point for the random search.…”
Section: Resultssupporting
confidence: 88%
See 4 more Smart Citations
“…The random search, which selects the best performed architecture from 5 randomly sampled architectures, turns out to be a considerably strong baseline that achieves comparable performance with the system found by joint optimization. This coincides with the observations in previous literature [12,17]. We need to also note that the baseline architectures are already optimized by the authors, which also provides a sufficiently good starting point for the random search.…”
Section: Resultssupporting
confidence: 88%
“…Jointly training the architecture weights and the model parameters saves time. However, the sub-optimal operations (e.g., simpler operations) may dominate the weights at an early stage, such that the optimal operations (e.g., operations with larger parameter sizes) may be ignored [17].…”
Section: Joint Optimizationmentioning
confidence: 99%
See 3 more Smart Citations