Markov models are commonly used to analyze real-world problems. Their combination of discrete states and stochastic transitions is suited to applications with deterministic and stochastic components. Hidden Markov models (HMMs) are a class of Markov models commonly used in pattern recognition. Currently, HMMs recognize patterns using a maximum-likelihood approach. One major drawback with this approach is that data observations are mapped to HMMs without considering the number of data samples available. Another problem is that this approach is only useful for choosing between HMMs. It does not provide a criterion for determining whether or not a given HMM adequately matches the data stream. In this paper, we recognize complex behaviors using HMMs and confidence intervals. The certainty of a data match increases with the number of data samples considered. Receiver operating characteristic curves are used to find the optimal threshold for either accepting or rejecting an HMM description. We present one example using a family of HMMs to show the utility of the proposed approach. A second example using models extracted from a database of consumer purchases provides additional evidence that this approach can perform better than existing techniques.
In this paper, we present a normalized statistical metric space for hidden Markov models (HMMs). HMMs are widely used to model real-world systems. Like graph matching, some previous approaches compare HMMs by evaluating the correspondence, or goodness of match, between every pair of states, concentrating on the structure of the models instead of the statistics of the process being observed. To remedy this, we present a new metric space that compares the statistics of HMMs within a given level of statistical significance. Compared with the Kullback-Leibler divergence, which is another widely used approach for measuring model similarity, our approach is a true metric, can always return an appropriate distance value, and provides a confidence measure on the metric value. Experimental results are given for a sample application, which quantify the similarity of HMMs of network traffic in the Tor anonymization system. This application is interesting since it considers models extracted from a system that is intentionally trying to obfuscate its internal workings. In the conclusion, we discuss applications in less-challenging domains, such as data mining.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.