We propose a framework for general probabilistic multi-step time series regression. Specifically, we exploit the expressiveness and temporal nature of Sequence-to-Sequence Neural Networks (e.g. recurrent and convolutional structures), the nonparametric nature of Quantile Regression and the efficiency of Direct Multi-Horizon Forecasting. A new training scheme, forking-sequences, is designed for sequential nets to boost stability and performance. We show that the approach accommodates both temporal and static covariates, learning across multiple related series, shifting seasonality, future planned event spikes and coldstarts in real life large-scale forecasting. The performance of the framework is demonstrated in an application to predict the future demand of items sold on Amazon.com, and in a public probabilistic forecasting competition to predict electricity price and load.Recurrent Neural Networks (RNN, Elman, 1990) have re-
Blind separation of independent sources from thieir corrvolutive mixtures is a problem in many real world multi-sensor applications. In this paper we present a solution to this problem lbased on the information maximization principle, which was recently proposed by Bell and Sejnowski for the case of blind separation of instantaneous mixtures. We present a feedback network architecture capable of coping with convolutive mixtures, and we derive the adaiptation equations for the adaptive filters in the network by maximizing the information transferred through the network. :Examples using speech signals are presented to illustrate the algorithm.
In order to learn discriminative feature transfonns, we discuss mu tual information between class labels and transformed features as a criterion. Instead of Shannon's definition we use measures based on Renyi entropy, which lends itself into an efficient implementa tion and an interpretation of "information potentials" and "infor mation forces" induced by samples of data. This paper presents two routes towards practical usability of the method, especially aimed to large databases: The first is an on-line stochastic gradient algorithm, and the second is based on approx imating class densi ties in the output space by Gaussian mixture models.
A classification system typically consists of both a feature extractor (preprocessor) and a classifier. These two components can be trained either independently or simultaneously. The former option has an implementation advantage since the extractor need only be trained once for use with any classifier, whereas the latter has an advantage since it can be used to minimize classification error directly. Certain criteria, such as Minimum Classification Error, are better suited for simultaneous training, whereas other criteria, such as Mutual Information, are amenable for training the feature extractor either independently or simultaneously. Herein, an information-theoretic criterion is introduced and is evaluated for training the extractor independently of the classifier. The proposed method uses nonparametric estimation of Renyi's entropy to train the extractor by maximizing an approximation of the mutual information between the class labels and the output of the feature extractor. The evaluations show that the proposed method, even though it uses independent training, performs at least as well as three feature extraction methods that train the extractor and classifier simultaneously.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.