Learning One-hidden-layer ReLU Networks via Gradient Descent

Zhang, Xiao; Yu, Yang; Wang, Lingxiao; Gu, Quanquan

doi:10.48550/arxiv.1806.07808

Cited by 31 publications

(20 citation statements)

References 36 publications

(80 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A series of papers made strong assumptions on input distribution as well as realizability of labels, and showed global convergence of (stochastic) gradient descent for some shallow neural networks (Tian, 2017;Soltanolkotabi, 2017;Brutzkus & Globerson, 2017;Du et al, 2017a,b;Li & Yuan, 2017). Some local convergence results have also been proved (Zhong et al, 2017;Zhang et al, 2018). However, these assumptions are not satisfied in practice.…”

Section: Related Workmentioning

confidence: 99%

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Arora¹,

Du²,

Hu³

et al. 2019

Preprint

152

228

View full text Add to dashboard Cite

Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17].(ii) Generalization bound independent of network size, using a data-dependent complexity measure.Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size.(iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent.The key idea is to track dynamics of training and generalization via properties of a related kernel.

show abstract

Section: Related Workmentioning

confidence: 99%

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Arora¹,

Du²,

Hu³

et al. 2019

Preprint

152

228

View full text Add to dashboard Cite

show abstract

“…However, cost functions that are optimized by neural networks might not meet this condition. So, in theory, those neural networks don't guarantee globally optimal solutions, but in practice, neural networks converge to a local minimum point as proven by Zhong et al [27] [28] and Zhang et al [29]. Just like the cases of Naive Bayes and Gradient Descent, we also make assumptions in our paper that help explain our method, but do not have to hold in order to achieve good results in practice.…”

Section: A Assumptionsmentioning

confidence: 98%

Towards Searching Efficient and Accurate Neural Network Architectures in Binary Classification Problems

Alparslan

Moyer

Isozaki

et al. 2021

2021 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

In recent years, deep neural networks have had great success in machine learning and pattern recognition. Architecture size for a neural network contributes significantly to the success of any neural network. In this study, we optimize the selection process by investigating different search algorithms to find a neural network architecture size that yields the highest accuracy. We apply binary search on a very well-defined binary classification network search space and compare the results to those of linear search. We also propose how to relax some of the assumptions regarding the dataset so that our solution can be generalized to any binary classification problem. We report a 100-fold running time improvement over the naive linear search when we apply the binary search method to our datasets in order to find the best architecture candidate. By finding the optimal architecture size for any binary classification problem quickly, we hope that our research contributes to discovering intelligent algorithms for optimizing architecture size selection in machine learning.

show abstract

“…This assumption is also made in the practical work [58]. Moreover, there is a large body of works that directly use GANs or deconvolution networks for super-resolution [31,61,72,90,94].…”

Section: Forward Super-resolution: a Special Property Of Imagesmentioning

confidence: 99%

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions

Allen-Zhu

2021

Preprint

View full text Add to dashboard Cite

Generative adversarial networks (GANs) are among the most successful models for learning high-complexity, real-world distributions. However, in theory, due to the highly non-convex, non-concave landscape of the minmax training objective, GAN remains one of the least understood deep learning models. In this work, we formally study how GANs can efficiently learn certain hierarchically generated distributions that are close to the distribution of images in practice. We prove that when a distribution has a structure that we refer to as forward superresolution, then simply training generative adversarial networks using gradient descent ascent (GDA) can indeed learn this distribution efficiently, both in terms of sample and time complexities. We also provide concrete empirical evidence that not only our assumption "forward super-resolution" is very natural in practice, but also the underlying learning mechanisms that we study in this paper (to allow us efficiently train GAN via GDA in theory) simulates the actual learning process of GANs in practice on real-world problems.

show abstract

Learning One-hidden-layer ReLU Networks via Gradient Descent

Cited by 31 publications

References 36 publications

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Towards Searching Efficient and Accurate Neural Network Architectures in Binary Classification Problems

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions

Contact Info

Product

Resources

About