“…While classical learning theories suggest that over-parameterized models tend to overfit [12], recent advances showed that an optimization algorithm may produce an implicit bias that regularizes the solution with desired properties. This type of results has led to new insights and better understandings on gradient descent for solving several fundamental problems, including logistic regression on linearly separated data [1], compressive sensing [2,3], sparse phase retrieval [4], nonlinear least-squares [5], low-rank (deep) matrix factorization [6][7][8][9], and deep linear neural networks [10,11], etc.…”