Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.
Machine learning is a popular topic in data analysis and modeling. Many different machine learning algorithms have been developed and implemented in a variety of programming languages over the past 20 years. In this article, we first provide an overview of machine learning and clarify its difference from statistical inference. Then, we review Scikit-learn, a machine learning package in the Python programming language that is widely used in data science. The Scikit-learn package includes implementations of a comprehensive list of machine learning methods under unified data and modeling procedure conventions, making it a convenient toolkit for educational and behavior statisticians.
An investigation of the surfaces of linear, segmented
block copolymers of poly(dimethylsiloxane−urea−urethanes) by dynamic contact angle analysis is
reported. The polymer films
are immersed in water, the time-dependent advancing and receding
contact angles are
observed, and the contact angle hysteresis is reported. The
initially hydrophobic polymer
surfaces are observed to become more hydrophilic with long-term
exposure to water. The
advancing contact angles are relatively constant with immersion time;
the receding contact
angles decrease to some equilibrium value after a few days' exposure
to water. It is proposed
that the surfaces reorganize by a mechanism in which the hard block
urethane−urea domains
migrate through the soft block silicone to the polymer−water
interface. The surface
reorganization kinetics are discussed in terms of the effects of
annealing as well as the
average molecular weight of the soft block.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.