Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks

Hu, Shoukang; Xie, Xurong; Liu, Shansong; Cui, Mingyu; Liu, Xunying; Meng, Helen

doi:10.1109/icassp39728.2021.9413630

Cited by 7 publications

(10 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The random search, which selects the best performed architecture from 5 randomly sampled architectures, turns out to be a considerably strong baseline that achieves comparable performance with the system found by joint optimization. This coincides with the observations in previous literature [12,17]. We need to also note that the baseline architectures are already optimized by the authors, which also provides a sufficiently good starting point for the random search.…”

Section: Resultssupporting

confidence: 88%

“…Jointly training the architecture weights and the model parameters saves time. However, the sub-optimal operations (e.g., simpler operations) may dominate the weights at an early stage, such that the optimal operations (e.g., operations with larger parameter sizes) may be ignored [17].…”

Section: Joint Optimizationmentioning

confidence: 99%

“…Alternatively, the uniform path sampling strategy is adopted for Eq. ( 4) by [16,17]. The idea is to randomly select one path in the supernet (i.e.…”

Section: Bi-level Optimizationmentioning

confidence: 99%

“…As the candidate operation parameters and the architecture weights are optimized on the same supernet, the sub-optimal architectures may be prematurely learned at an early stage of the training. To encourage all candidate architectures to be optimized simultaneously, various techniques are applied, e.g., operation dropout [15], uniform path sampling [16,17]. However, the training using operation dropout is often not stable, since certain nodes in the supernet may drop all operations.…”

Section: Introductionmentioning

confidence: 99%

“…The NAS techniques have also been successfully applied to speech synthesis [18] and speech recognition [17,[19][20][21][22][23][24][25]. It has been shown that the performance can be improved and the model parameter sizes can be reduced at the same time.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Neural Architecture Search for Speech Emotion Recognition

Wu¹,

Hu²,

Wu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Deep neural networks have brought significant advancements to speech emotion recognition (SER). However, the architecture design in SER is mainly based on expert knowledge and empirical (trial-and-error) evaluations, which is timeconsuming and resource intensive. In this paper, we propose to apply neural architecture search (NAS) techniques to automatically configure the SER models. To accelerate the candidate architecture optimization, we propose a uniform path dropout strategy to encourage all candidate architecture operations to be equally optimized. Experimental results of two different neural structures on IEMOCAP show that NAS can improve SER performance (54.89% to 56.28%) while maintaining model parameter sizes. The proposed dropout strategy also shows superiority over the previous approaches.

show abstract

Section: Resultssupporting

confidence: 88%

Section: Joint Optimizationmentioning

confidence: 99%

“…Alternatively, the uniform path sampling strategy is adopted for Eq. ( 4) by [16,17]. The idea is to randomly select one path in the supernet (i.e.…”

Section: Bi-level Optimizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Neural Architecture Search for Speech Emotion Recognition

Wu¹,

Hu²,

Wu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

NAS-TasNet: Neural Architecture Search for Time-Domain Speech Separation

et al. 2022

View full text Add to dashboard Cite

The fully convolutional time-domain speech separation network (Conv-TasNet) has been used as a backbone model in various studies because of its structural excellence. To maximize the performance and efficientcy of Conv-TasNet, we attempt to apply a neural architecture search (NAS). NAS is a branch of automated machine learning that automatically searches for an optimal model structure while minimizing human intervention. In this study, we introduce a candidate operation to define the search space of NAS for Conv-TasNet. In addition, we introduce a low computational cost NAS to overcome the limitations of the backbone model that consumes large GPU memory for training. Next, we determine the optimized separation module structures using two search strategies based on gradient descent and reinforcement learning. In addition, an imbalance in the architecture parameters update, which are parameters of the NAS, was observed when simply applying the NAS. Therefore, we introduce an auxiliary loss method that is appropriate for the Conv-TasNet architecture for a balanced architecture parameter update of the entire model. Furthermore, we determine that the auxiliary loss technique mitigates the imbalance of architecture parameter updates and improves the separation accuracy.INDEX TERMS Automated machine learning (AutoML), convolutional neural network (CNN), deep learning, end-to-end, speech processing, speech separation, neural architecture search, time-domain speech separation.

show abstract

Neural Architecture Search for Speech Emotion Recognition

et al. 2022

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks

Cited by 7 publications

References 25 publications

Neural Architecture Search for Speech Emotion Recognition

Neural Architecture Search for Speech Emotion Recognition

NAS-TasNet: Neural Architecture Search for Time-Domain Speech Separation

Neural Architecture Search for Speech Emotion Recognition

Contact Info

Product

Resources

About