Chiyuan Zhang scite author profile

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models. We supplement this republication with a new section at the end summarizing recent progresses in the field since the original version of this paper.

show abstract

Understanding deep learning requires rethinking generalization

Zhang

Bengio

Hardt

et al. 2016

Preprint

957

907

View full text Add to dashboard Cite

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models. * Work performed while interning at Google Brain.† Work performed at Google Brain.

show abstract

Unsupervised feature selection for multi-cluster data

2010

View full text Add to dashboard Cite

In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.

show abstract

Bariatric Surgery Results in Cortical Bone Loss

Stein

Carrelli

Young

et al. 2013

The Journal of Clinical Endocrinology & Metabolism

129

144

View full text Add to dashboard Cite

show abstract

Abnormal microarchitecture and reduced stiffness at the radius and tibia in postmenopausal women with fractures

et al. 2010

View full text Add to dashboard Cite

Measurement of areal bone mineral density (aBMD) by dual-energy x-ray absorptiometry (DXA) has been shown to predict fracture risk. High-resolution peripheral quantitative computed tomography (HR-pQCT) yields additional information about volumetric BMD (vBMD), microarchitecture, and strength that may increase understanding of fracture susceptibility. Women with (n = 68) and without (n = 101) a history of postmenopausal fragility fracture had aBMD measured by DXA and trabecular and cortical vBMD and trabecular microarchitecture of the radius and tibia measured by HR-pQCT. Finite-element analysis (FEA) of HR-pQCT scans was performed to estimate bone stiffness. DXA T-scores were similar in women with and without fracture at the spine, hip, and one-third radius but lower in patients with fracture at the ultradistal radius (p < .01). At the radius fracture, patients had lower total density, cortical thickness, trabecular density, number, thickness, higher trabecular separation and network heterogeneity (p < .0001 to .04). At the tibia, total, cortical, and trabecular density and cortical and trabecular thickness were lower in fracture patients (p < .0001 to .03). The differences between groups were greater at the radius than at the tibia for inner trabecular density, number, trabecular separation, and network heterogeneity (p < .01 to .05). Stiffness was reduced in fracture patients, more markedly at the radius (41% to 44%) than at the tibia (15% to 20%). Women with fractures had reduced vBMD, microarchitectural deterioration, and decreased strength. These differences were more prominent at the radius than at the tibia. HR-pQCT and FEA measurements of peripheral sites are associated with fracture prevalence and may increase understanding of the role of microarchitectural deterioration in fracture susceptibility. © 2010 American Society for Bone and Mineral Research.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chiyuan Zhang

Understanding deep learning (still) requires rethinking generalization

Understanding deep learning requires rethinking generalization

Unsupervised feature selection for multi-cluster data

Bariatric Surgery Results in Cortical Bone Loss

Abnormal microarchitecture and reduced stiffness at the radius and tibia in postmenopausal women with fractures

Contact Info

Product

Resources

About