Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small gap between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models. We supplement this republication with a new section at the end summarizing recent progresses in the field since the original version of this paper.
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models. * Work performed while interning at Google Brain.† Work performed at Google Brain.
In many data analysis tasks, one is often confronted with very high dimensional data. Feature selection techniques are designed to find the relevant feature subset of the original features which can facilitate clustering, classification and retrieval. In this paper, we consider the feature selection problem in unsupervised learning scenario, which is particularly difficult due to the absence of class labels that would guide the search for relevant information. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. Traditional unsupervised feature selection methods address this issue by selecting the top ranked features based on certain scores computed independently for each feature. These approaches neglect the possible correlation between different features and thus can not produce an optimal feature subset. Inspired from the recent developments on manifold learning and L1regularized models for subset selection, we propose in this paper a new approach, called Multi-Cluster Feature Selection (MCFS), for unsupervised feature selection. Specifically, we select those features such that the multi-cluster structure of the data can be best preserved. The corresponding optimization problem can be efficiently solved since it only involves a sparse eigen-problem and a L1-regularized least squares problem. Extensive experimental results over various real-life data sets have demonstrated the superiority of the proposed algorithm.
After bariatric surgery, hip bone loss reflects skeletal unloading and cortical bone loss reflects secondary hyperparathyroidism. This study highlights deterioration of cortical bone loss as a novel mechanism for bone loss after bariatric surgery.
Measurement of areal bone mineral density (aBMD) by dual-energy x-ray absorptiometry (DXA) has been shown to predict fracture risk. High-resolution peripheral quantitative computed tomography (HR-pQCT) yields additional information about volumetric BMD (vBMD), microarchitecture, and strength that may increase understanding of fracture susceptibility. Women with (n = 68) and without (n = 101) a history of postmenopausal fragility fracture had aBMD measured by DXA and trabecular and cortical vBMD and trabecular microarchitecture of the radius and tibia measured by HR-pQCT. Finite-element analysis (FEA) of HR-pQCT scans was performed to estimate bone stiffness. DXA T-scores were similar in women with and without fracture at the spine, hip, and one-third radius but lower in patients with fracture at the ultradistal radius (p < .01). At the radius fracture, patients had lower total density, cortical thickness, trabecular density, number, thickness, higher trabecular separation and network heterogeneity (p < .0001 to .04). At the tibia, total, cortical, and trabecular density and cortical and trabecular thickness were lower in fracture patients (p < .0001 to .03). The differences between groups were greater at the radius than at the tibia for inner trabecular density, number, trabecular separation, and network heterogeneity (p < .01 to .05). Stiffness was reduced in fracture patients, more markedly at the radius (41% to 44%) than at the tibia (15% to 20%). Women with fractures had reduced vBMD, microarchitectural deterioration, and decreased strength. These differences were more prominent at the radius than at the tibia. HR-pQCT and FEA measurements of peripheral sites are associated with fracture prevalence and may increase understanding of the role of microarchitectural deterioration in fracture susceptibility. © 2010 American Society for Bone and Mineral Research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.