“…Our motivations, contributions and methods. It was recently discovered in [44,43,29,31,55,36,27,60,30,14] that SGD algorithms can be (weakly) approximated by continuous time SDEs. These SDEs often offer much needed insight to the algorithms under considerations, for instance, the continuous time treatment allows applications of stochastic control theory to develop novel adaptive algorithms [64,66].…”