Feature selection aims to select a feature subset from an original feature set based on a certain evaluation criterion. Since feature selection can achieve efficient feature reduction, it has become a key method for data preprocessing in many data mining tasks. Recently, many feature selection strategies have been developed since in most cases it is infeasible to obtain an optimal/reduced feature subset by using exhaustive search. Among these strategies, fuzzy rough set theory has proved to be an ideal candidate for dealing with uncertain information. This article provides a comprehensive review on the fuzzy rough set theory and two fuzzy rough set theory based feature selection methods, that is, fuzzy rough set based feature selection methods and fuzzy rough neural network based feature selection methods. We review the publications related to the fuzzy rough theory and its applications in feature selection. In addition, the challenges in the two types of feature selection methods are also discussed.
This article is categorized under:
Technologies > Machine Learning
Algorithms for skyline querying based on wireless sensor networks (WSNs) have been widely used in the field of environmental monitoring. Because of the multi-dimensional nature of the problem of monitoring spatial position, traditional skyline query strategies cause enormous computational costs and energy consumption. To ensure the efficient use of sensor energy, a geometry-based distributed spatial query strategy (GDSSky) is proposed in this paper. Firstly, the paper presents a geometry-based region partition strategy. It uses the skyline area reduction method based on the convex hull vertices, to quickly query the spatial skyline data related to a specific query area, and proposes a regional partition strategy based on the triangulation method, to implement distributed queries in each sub-region and reduce the comparison times between nodes. Secondly, a sub-region clustering strategy is designed to group the data inside into clusters for parallel queries that can save time. Finally, the paper presents a distributed query strategy based on the data node tree to traverse all adjacent sensors’ monitoring locations. It conducts spatial skyline queries for spatial skyline data that have been obtained and not found respectively, so as to realize the parallel queries. A large number of simulation results shows that GDSSky can quickly return the places which are nearer to query locations and have larger pollution capacity, and significantly reduce the WSN energy consumption.
A labeled graph is a special structure with node identification capability, which is often used in information networks, biological networks, and other fields. The subgraph query is widely used as an important means of graph data analysis. As the size of the labeled graph increases and changes dynamically, users tend to focus on the high-match results that are of interest to them, and they want to take advantage of the relationship and number of results to get the results of the query quickly. For this reason, we consider the individual needs of users and propose a dynamic Top-K interesting subgraph query. This method establishes a novel graph topology feature index (GTSF index) including a node topology feature index (NTF index) and an edge feature index (EF index), which can effectively prune and filter the invalid nodes and edges that do not meet the restricted condition. The multi-factor candidate set filtering strategy is proposed based on the GTSF index, which can be further pruned to obtain fewer candidate sets. Then, we propose a dynamic Top-K interesting subgraph query method based on the idea of the sliding window to realize the dynamic modification of the matching results of the subgraph in the dynamic evolution of the label graph, to ensure real-time and accurate results of the query. In addition, considering the factors, such as frequent Input/Output (I/O) and network communication overheads, the optimization mechanism of the graph changes and an incremental maintenance strategy for the index are proposed to reduce the huge cost of redundant operation and global updates. The experimental results show that the proposed method can effectively deal with a dynamic Top-K interesting subgraph query on a large-scale labeled graph, at the same time the optimization mechanism of graph changes and the incremental maintenance strategy of the index can effectively reduce the maintenance overheads.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.