The increase in the volume of structured and unstructured data related to more than just sport events leads to the development and increased use of techniques that extract information and employ machine-learning algorithms in predicting process outcomes based on input but not necessarily output data. Taking sports into consideration, predicting outcomes, and extracting valuable information has become appealing not only to sports workers but also to the wider audience, particularly in the areas of team management and sports betting. The aim of this article is to review the existing machine learning (ML) algorithms in predicting sport outcomes. Over 100 papers were analyzed and only some of these papers were taken into consideration. Almost all of the analyzed papers use some sort of feature selection and feature extraction, most often prior to using the machine-learning algorithm. As an evaluation method of ML algorithms, researchers, in most cases, use data segmentation with data being chronologically distributed. In addition to data segmentation, researchers also use the k-cross-evaluation method. Sport predictions are usually treated as a classification problem with one class being predicted and rare cases being predicted as numerical values. Mostly used ML models are neural networks using data segmentation.
In recovering information from the chart image, the first step should be chart type classification. Throughout history, many approaches have been used, and some of them achieve results better than others. The latest articles are using a Support Vector Machine (SVM) in combination with a Convolutional Neural Network (CNN), which achieve almost perfect results with the datasets of few thousand images per class. The datasets containing chart images are primarily synthetic and lack real-world examples. To overcome the problem of small datasets, to our knowledge, this is the first report of using Siamese CNN architecture for chart type classification. Multiple network architectures are tested, and the results of different dataset sizes are compared. The network verification is conducted using Few-shot learning (FSL). Many of described advantages of Siamese CNNs are shown in examples. In the end, we show that the Siamese CNN can work with one image per class, and a 100% average classification accuracy is achieved with 50 images per class, where the CNN achieves only average classification accuracy of 43% for the same dataset.
Data visualization is developed from the need to display a vast quantity of information more transparently. Data visualization often incorporates important information that is not listed anywhere in the document and enables the reader to discover significant data and save it in longer-term memory. On the other hand, Internet search engines have difficulty processing data visualization and connecting visualization and the request submitted by the user. With the use of data visualization, all blind individuals and individuals with impaired vision are left out. This article utilizes machine learning to classify data visualizations into 10 classes. Tested model is trained four times on the dataset which is preprocessed through four stages. Achieved accuracy of 89 % is comparable to other methods’ results. It is showed that image processing can impact results, i.e. increasing or decreasing level of details in image impacts on average classification accuracy significantly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.