Abstract:Non-symmetric Kullback-Leibler divergence (KLD) measures proximity of probability density functions (pdfs). Bernardo (Ann. Stat. 1979; 7(3):686-690) had shown its unique role in approximation of pdfs. The order of the KLD arguments is also implied by his methodological result. Functional approximation of estimation and stabilized forgetting, serving for tracking of slowly varying parameters, use the reversed order. This choice has the pragmatic motivation: recursive estimator often approximates the parametric … Show more
“…Although the resulting model does not specify the transition matrix of the Markov chain explicitly, Smith and Miller (1986) argued that this is not a defect of the model, since the data provide information about π t −1| t −1, k and π t | t −1, k , but no additional information about Q . Exponential forgetting has been used for updating discrete probabilities in a different two-hypothesis context by Kárný and Andrýsek (2009).…”
We consider the problem of online prediction when it is uncertain what the best prediction model to use is. We develop a method called Dynamic Model Averaging (DMA) in which a state space model for the parameters of each model is combined with a Markov chain model for the correct model. This allows the "correct" model to vary over time. The state space and Markov chain models are both specified in terms of forgetting, leading to a highly parsimonious representation. As a special case, when the model and parameters do not change, DMA is a recursive implementation of standard Bayesian model averaging, which we call recursive model averaging. The method is applied to the problem of predicting the output strip thickness for a cold rolling mill, where the output is measured with a time delay. We found that when only a small number of physically motivated models were considered and one was clearly best, the method quickly converged to the best model, and the cost of model uncertainty was small; indeed DMA performed slightly better than the best physical model. When model uncertainty and the number of models considered were large, our method ensured that the penalty for model uncertainty was small. At the beginning of the process, when control is most difficult, we found that DMA over a large model space led to better predictions than the single best performing physically motivated model. We also applied the method to several simulated examples, and found that it recovered both constant and time-varying regression parameters and model specifications quite well.
“…Although the resulting model does not specify the transition matrix of the Markov chain explicitly, Smith and Miller (1986) argued that this is not a defect of the model, since the data provide information about π t −1| t −1, k and π t | t −1, k , but no additional information about Q . Exponential forgetting has been used for updating discrete probabilities in a different two-hypothesis context by Kárný and Andrýsek (2009).…”
We consider the problem of online prediction when it is uncertain what the best prediction model to use is. We develop a method called Dynamic Model Averaging (DMA) in which a state space model for the parameters of each model is combined with a Markov chain model for the correct model. This allows the "correct" model to vary over time. The state space and Markov chain models are both specified in terms of forgetting, leading to a highly parsimonious representation. As a special case, when the model and parameters do not change, DMA is a recursive implementation of standard Bayesian model averaging, which we call recursive model averaging. The method is applied to the problem of predicting the output strip thickness for a cold rolling mill, where the output is measured with a time delay. We found that when only a small number of physically motivated models were considered and one was clearly best, the method quickly converged to the best model, and the cost of model uncertainty was small; indeed DMA performed slightly better than the best physical model. When model uncertainty and the number of models considered were large, our method ensured that the penalty for model uncertainty was small. At the beginning of the process, when control is most difficult, we found that DMA over a large model space led to better predictions than the single best performing physically motivated model. We also applied the method to several simulated examples, and found that it recovered both constant and time-varying regression parameters and model specifications quite well.
“…At the Bayesian level, a sort of forgetting arises through combining the posterior pdf with its flattened alternative. The combination strategies prominently involve the nonsymmetric Kullback-Leibler divergence (KLD) [8] with different properties depending on the order of the KLD arguments [9]. There is rich literature on the adaptation of a single forgetting factor causing the information about all of the system parameters to be uniformly discounted [10]- [13].…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.