mental results show that the predictive performance of models using Random Search is equivalent to those obtained using meta heuristics and Grid Search, but with a lower computational cost.
We advocate the use of curated, comprehensive benchmark suites of machine learning datasets, backed by standardized OpenML-based interfaces and complementary software toolkits written in Python, Java and R. Major distinguishing features of OpenML benchmark suites are (a) ease of use through standardized data formats, APIs, and existing client libraries; (b) machine-readable meta-information regarding the contents of the suite; and (c) online sharing of results, enabling large scale comparisons. As a first such suite, we propose the OpenML100, a machine learning benchmark suite of 100 classification datasets carefully curated from the thousands of datasets available on OpenML.org.
Abstract-Many classification algorithms, such as Neural Net works and Support Vector Machines, have a range of hyper parameters that may strongly affect the predictive performance of the models induced by them. Hence, it is recommended to define the values of these hyper-parameters using optimization techniques. While these techniques usually converge to a good set of values, they typically have a high computational cost, because many candidate sets of values are evaluated during the optimization process. It is often not clear whether this will result in parameter settings that are significantly better than the default settings. When training time is limited, it may help to know when these parameters should definitely be tuned. In this study, we use meta-learning to predict when optimization techniques are expected to lead to models whose predictive performance is better than those obtained by using default parameter settings. Hence, we can choose to employ optimization techniques only when they are expected to improve performance, thus reducing the overall computational cost. We evaluate these meta-learning techniques on more than one hundred data sets. The experimental results show that it is possible to accurately predict when optimization techniques should be used instead of default values suggested by some machine learning libraries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.