The problem of recognizing patterns, when there are few training data available, is particularly relevant and arises in cases when collection of training data is expensive or essentially impossible. The work proposes a new probability model MC&CL (Markov Chain and Clusters) based on a combination of markov chain and algorithm of clustering (self-organizing map of Kohonen, k-means method), to solve a problem of classifying sequences of observations, when the amount of training dataset is low. An original experimental comparison is made between the developed model (MC&CL) and a number of the other popular models to classify sequences: HMM (Hidden Markov Model), HCRF (Hidden Conditional Random Fields),LSTM (Long Short-Term Memory), kNN+DTW (k-Nearest Neighbors algorithm + Dynamic Time Warping algorithm). A comparison is made using synthetic random sequences, generated from the hidden markov model, with noise added to training specimens. The best accuracy of classifying the suggested model is shown, as compared to those under review, when the amount of training data is low.
This article proposes a generative probabilistic graphical model with hidden states (Neural Gas Graphical Model (NGGM)) based on data approximation with a grid of "neural gas" nodes aimed at solving the problem of long time series classification. Such time series include information about changes in economic, weather and health values, as well as information about changes in values of operation sensors of technical objects during a quite long period. The most difficult task of classification of such long time series using probabilistic graphical models with hidden states is the selection of the optimum number of hidden states. This work proposes a method for automatic selection of the optimum number of hidden states of the model in the course of the model learning. The model proposed in the article and the methods of its learning are based on a combination of elements used in the metric and Bayesian approaches to classification. The basic NGGM purpose is to match hidden states of a graphical model and nodes (neurons) of the approximating grid. Comparative assessment of the quality of the proposed NGGM model classification with the currently most common time series classification models has been made: the HMM (Hidden Markov Model) and the HCRF (Hidden Conditional Random Fields) applied at the data sets from the UCI repository. The quality was assessed by the macro-average F-measure criterion using the k-fold cross-validation. As a result of classification quality analysis, it was noted that the proposed NGGM model showed better classification quality on the data set being a set of multiple, labeled samples of pen tip trajectories recorded whilst writing individual characters than the HCRF and HMM models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.