Interest in sports predictions as well as the public availability of large amounts of structured and unstructured data are increasing every day. As sporting events are not completely independent events, but characterized by the influence of the human factor, the adequate selection of the analysis process is very important. In this paper, seven different classification machine learning algorithms are used and validated with two validation methods: Train&Test and cross-validation. Validation methods were analyzed and critically reviewed. The obtained results are analyzed and compared. Analyzing the results of the used machine learning algorithms, the best average prediction results were obtained by using the nearest neighbors algorithm and the worst prediction results were obtained by using decision trees. The cross-validation method obtained better results than the Train&Test validation method. The prediction results of the Train&Test validation method by using disjoint datasets and up-to-date data were also compared. Better results were obtained by using up-to-date data. In addition, directions for future research are also explained. Symmetry 2020, 12, 431 2 of 15The aim of this paper is, through the comparison of the classification machine learning algorithms in predicting basketball game outcomes, to define which algorithm, validation method, and data preparation method produces better prediction results.This paper demonstrates what impact the different validation methods have on the prediction accuracy when using different ML algorithms. Moreover, the impact of selecting a validation method on the prediction results when applying ML to the disjoint datasets or the up-to-date data is revealed, thereby enabling the formation of recommendations for the most appropriate combination of the ML algorithm and validation method, depending on the available datasets.After this introduction and the overview of sport outcome-related researches, the second chapter provides the basic information about classification machine learning algorithms and the validation methods applied in this research. The third chapter describes the data acquisition and data preparation procedures. The research results are presented and discussed in the fourth chapter, and the conclusions are given at the end of paper.
Related Literature ReviewThe most common algorithm in predicting outcomes in sports are neural networks coupled with the Train&Test validation method. The authors of [1] used a variety of neural networks and Train&Test validation for predicting game outcomes in the National Basketball Association (NBA) league, with the best results of more than 70%. In [2], the authors used 37 algorithms in the Waikato Environment for Knowledge Analysis (WEKA) and Train&Test validation method. The result with the best yield was 72.8%, showing that the best classifiers have 5% better precision than the referent classifier, which favors the team with the better rating. The authors of [3] used logistic regression, Naïve Bayes, Support Vector Machine (S...