For many t ypes of machine learning algorithms, one can compute the statistically optimal" way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show h o w the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both e cient and accurate. Empirically, w e observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.
Abstract. Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed through a single discrete variable-the hidden state. We discuss a generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner. We describe an exact algorithm for inferring the posterior probabilities of the hidden state variables given the observations, and relate it to the forward-backward algorithm for HMMs and to algorithms for more general graphical models. Due to the combinatorial nature of the hidden state representation, this exact algorithm is intractable. As in other intractable systems, approximate inference can be carried out using Gibbs sampling or variational methods. Within the variational framework, we present a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model. Empirical comparisons suggest that these approximations are efficient and provide accurate alternatives to the exact methods. Finally, we use the structured approximation to model Bach's chorales and show that factorial HMMs can capture statistical structure in this data set which an unconstrained HMM cannot.
IntroductionSuppose we wish to build a model of data from a finite sequence of ordered observations, {Y1, Y2,..., Yt}. In most realistic scenarios, from modeling stock prices to physiological data, the observations are not related deterministically. Furthermore, there is added uncertainty resulting from the limited size of our data set and any mismatch between our model and the true process. Probability theory provides a powerful tool for expressing both randomness and uncertainty in our model [23]. We can express the uncertainty in our prediction of the future outcome Yt+l via a probability density P(Yt+llY1,..., Yt). Such a probability density can then be used to make point predictions, define error bars, or make decisions that are expected to minimize some loss function. This chapter presents a probabilistic framework for learning models of temporal data. We express these models using the Bayesian network formalism (a.k.a. probabilistic graphical models or belief networks)--a marriage of probability theory and graph theory in which dependencies between variables are expressed graphically. The graph not only allows the user to understand which variables affect which other ones, but also serves as the backbone for efficiently computing marginal and conditional probabilities that may be required for inference and learning.The next section provides a brief tutorial of Bayesian networks. Section 3 demonstrates the use of Bayesian networks for modeling time series, including some well-known examples such as the Kalman filer and the hidden Markov model. Section 4 focuses on the problem of learning the parameters of a Bayesian network using the Expectation-Maximization (EM) algorithm [3,10]. Section 5 describes some richer models appropriate for time series with nonlinear or multiresolution structure. Inference in such models may be computationally intractable. However, in section 6 we present several tractable methods for approximate inference which can be used as the basis for learning. A Bayesian network tutorialA Bayesian network is simply a graphical model for representing conditional independencies between a set of random variables. Consider four random variables,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.