SUMMARYOver the past 15 years the linear learning machine has been applied to a large number of chemical problems. The learning machine approach is conceptually simple and does not require knowledge about the statistical distribution of the data. However, there are problems associated with this approach. One problem which has not been investigated is the influence of mislabeled samples on the positioning of the hyerplane in feature space. If a few samples in a data set are incorrectly tagged prior to training (i.e. the samples are labeled as members of class 2 even though they are actually members of class l), it is still possible using the linear learning machine to achieve a classification success rate of 100% for the training set. However, unfavorable results will be obtained for the prediction set. The magnitude of this effect and its potential implications regarding the proper use of the linear learning machine are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.