A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training

Haider, Adnan; Zhang, Chao; Kreyssig, Florian; Woodland, Philip C.

doi:10.1016/j.neunet.2021.05.011

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

2024

Publication Types

Select...

Article4

Other1

Relationship

Self Cite0

Independent5

Authors

Journals

Cited by 5 publications

(1 citation statement)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides the low resource problem, how to train an ASR system with large amount of data is also dramatically important. Haider et al [13] presented a novel Natural Gradient and hessian-Free (NGHF) optimisation framework for neural network training that can operate efficiently in a distributed manner. Their experiments show that NGHF not only achieves larger word error rate reductions than standard stochastic gradient descent or Adam, but also requires orders of magnitude fewer parameter updates.…”

mentioning

confidence: 99%

Guest editorial: Special issue on advances in deep learning based speech processing

2023

View full text Add to dashboard Cite

mentioning

confidence: 99%

Guest editorial: Special issue on advances in deep learning based speech processing

2023

View full text Add to dashboard Cite

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

Guo

Liu

Han

2023

J. Oper. Res. Soc. China

View full text Add to dashboard Cite

Numerous intriguing optimization problems arise as a result of the advancement of machine learning. The stochastic first-order method is the predominant choice for those problems due to its high efficiency. However, the negative effects of noisy gradient estimates and high nonlinearity of the loss function result in a slow convergence rate. Second-order algorithms have their typical advantages in dealing with highly nonlinear and ill-conditioning problems. This paper provides a review on recent developments in stochastic variants of quasi-Newton methods, which construct the Hessian approximations using only gradient information. We concentrate on BFGS-based methods in stochastic settings and highlight the algorithmic improvements that enable the algorithm to work in various scenarios. Future research on stochastic quasi-Newton methods should focus on enhancing its applicability, lowering the computational and storage costs, and improving the convergence rate.

show abstract

A second-order accelerated neurodynamic approach for distributed convex optimization

et al. 2022

View full text Add to dashboard Cite

A distributed optimisation framework combining natural gradient with Hessian-free for discriminative sequence training

Cited by 5 publications

References 56 publications

Guest editorial: Special issue on advances in deep learning based speech processing

Guest editorial: Special issue on advances in deep learning based speech processing

An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning

A second-order accelerated neurodynamic approach for distributed convex optimization

Contact Info

Product

Resources

About