The objective of this research is to push the frontiers in Automated Machine Learning, specifically targeting Deep Learning. We analyse ChaLearn's Automated Deep Learning challenge whose design features include: (i) Code submissions entirely blind-tested, on five classification problems during development, then ten others during final testing. (ii) Raw data from various modalities (image, video, text, speech, tabular data), formatted as tensors. (iii) Emphasis on "any-time learning" strategies by imposing fixed time/memory resources and using the Area under Learning curve as metric. (iv) Baselines provided, including "Baseline 3", combining top-ranked solutions of past rounds (AutoCV, AutoNLP, AutoSpeech,and AutoSeries). (v) No Deep Learning imposed. Principal findings: (1) The top two winners passed all final tests without failure, a significant step towards true automation. Their solutions were open-sourced.(2) Despite our effort to format all datasets uniformly to encourage generic solutions, the participants adopted specific workflows for each modality. (3) Any-time learning was addressed successfully, without sacrificing final performance. (4) Although some solutions improved over Baseline 3, it strongly influenced many. (5) Deep Learning solutions dominated, but Neural Architecture Search was impractical within the time budget imposed. Most solutions relied on fixed-architecture pre-trained networks, with fine-tuning. Ablation studies revealed the importance of meta-learning, ensembling, and efficient data loading, while data-augmentation is not critical.
Bayesian optimization (BO) has become an established framework and popular tool for hyperparameter optimization (HPO) of machine learning (ML) algorithms. While known for its sample-efficiency, vanilla BO can not utilize readily available prior beliefs the practitioner has on the potential location of the optimum. Thus, BO disregards a valuable source of information, reducing its appeal to ML practitioners. To address this issue, we propose πBO, an acquisition function generalization which incorporates prior beliefs about the location of the optimum in the form of a probability distribution, provided by the user. In contrast to previous approaches, πBO is conceptually simple and can easily be integrated with existing libraries and many acquisition functions. We provide regret bounds when πBO is applied to the common Expected Improvement acquisition function and prove convergence at regular rates independently of the prior. Further, our experiments show that πBO outperforms competing approaches across a wide suite of benchmarks and prior characteristics. We also demonstrate that πBO improves on the state-of-theart performance for a popular deep learning task, with a 12.5× time-to-accuracy speedup over prominent BO approaches.
After developer adjustments to a machine learning (ML) algorithm, how can the results of an old hyperparameter optimization (HPO) automatically be used to speedup a new HPO? This question poses a challenging problem, as developer adjustments can change which hyperparameter settings perform well, or even the hyperparameter search space itself. While many approaches exist that leverage knowledge obtained on previous tasks, so far, knowledge from previous development steps remains entirely untapped. In this work, we remedy this situation and propose a new research framework: hyperparameter transfer across adjustments (HT-AA). To lay a solid foundation for this research framework, we provide four simple HT-AA baseline algorithms and eight benchmarks changing various aspects of ML algorithms, their hyperparameter search spaces, and the neural architectures used. The best baseline, on average and depending on the budgets for the old and new HPO, reaches a given performance 1.2-2.6x faster than a prominent HPO algorithm without transfer. As HPO is a crucial step in ML development but requires extensive computational resources, this speedup would lead to faster development cycles, lower costs, and reduced environmental impacts. To make these benefits available to ML developers off-the-shelf and to facilitate future research on HT-AA, we provide python packages for our baselines and benchmarks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.