Real-world problems are commonly characterized by a high feature dimensionality, which hinders the modelling and descriptive analysis of the data. However, some of these data may be irrelevant or redundant for the learning process. Different approaches can be used to reduce this information, improving not only the speed of building models but also their performance and interpretability. In this review, we focus on feature subset selection (FSS) techniques, which select a subset of the original feature set without making any transformation on the attributes. Traditional batch FSS algorithms may not be adequate to efficiently handle large volumes of data, either because memory problems arise or data are received in a sequential manner. Thus, this article aims to survey the state of the art of incremental FSS algorithms, which can perform more efficiently under these circumstances. Different strategies are described, such as incrementally updating feature weights, applying information theory or using rough set-based FSS, as well as multiple supervised and unsupervised learning tasks where the application of FSS is interesting.
The multidimensional classification of multivariate time series deals with the assignment of multiple classes to time-ordered data described by a set of feature variables. Although this challenging task has received almost no attention in the literature, it is present in a wide variety of domains, such as medicine, finance or industry. The complexity of this problem lies in two nontrivial tasks, the learning with multivariate time series in continuous time and the simultaneous classification of multiple class variables that may show dependencies between them. These can be addressed with different strategies, but most of them may involve a difficult preprocessing of the data, high space and classification complexity or ignoring useful interclass dependencies. Additionally, no attention has been given to the development of new multidimensional classifiers of time series based on probabilistic graphical models, even though transparent models can facilitate further understanding of the domain. In this paper, a novel probabilistic graphical model is proposed, which is able to classify a discrete multivariate temporal sequence into multiple class variables while modeling their dependencies. This model extends continuous time Bayesian networks to the multidimensional classification problem, which are able to explicitly represent the behavior of time series that This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.