Feature subset selection is one of the important problems in a number of fields namely data mining, machine learning, pattern recognition. It refers to the problem of opting for useful features that are neither irrelevant nor redundant. Since most of the data acquired through different sources are not in a proper shape to mine useful patterns from it therefore feature selection is applied over this data to filter out useless features. But since feature selection is a combinatorial optimization problem therefore exhaustively generating and evaluating all possible subsets is intractable in terms of computational cost, memory usage and processing time. Hence such a mechanism is required that intelligently searches for useful set of features in a polynomial time. In this study a feature subset selection algorithm based on conditional mutual information and ant colony optimization is proposed. The proposed method is a pure filter based feature subset selection technique that incurs less computational cost and proficient in terms of classification accuracy. Moreover, along with high accuracy it opts for less number of features. Extensive experimentation is performed based on thirteen benchmark datasets over a number of well known classification algorithms. Empirical results endorse efficiency and effectiveness of the proposed method.
Feature selection is considered to be one of the most critical methods for choosing appropriate features from a larger set of items. This task requires two basic steps: ranking and filtering. Of these, the former necessitates the ranking of all features, while the latter involves filtering out all irrelevant features based on some threshold value. In this regard, several feature selection methods with well-documented capabilities and limitations have already been proposed. Similarly, feature ranking is also nontrivial, as it requires the designation of an optimal cutoff value so as to properly select important features from a list of candidate features. However, the availability of a comprehensive feature ranking and a filtering approach, which alleviates the existing limitations and provides an efficient mechanism for achieving optimal results, is a major problem. Keeping in view these facts, we present an efficient and comprehensive univariate ensemble-based feature selection (uEFS) methodology to select informative features from an input dataset. For the uEFS methodology, we first propose a unified features scoring (UFS) algorithm to generate a final ranked list of features following a comprehensive evaluation of a feature set. For defining cutoff points to remove irrelevant features, we subsequently present a threshold value selection (TVS) algorithm to select a subset of features that are deemed important for the classifier construction. The uEFS methodology is evaluated using standard benchmark datasets. The extensive experimental results show that our proposed uEFS methodology provides competitive accuracy and achieved (1) on average around a 7% increase in f-measure, and (2) on average around a 5% increase in predictive accuracy as compared with state-of-the-art methods.
this research aims to provide a mechanism which can enhance student experience and confidence at Middle East College. The research is to find out the reason why students lack in experience and confidence in choosing Web Application Development module at Middle East College. By using a learning environment (blended learning), which can be suitable to increase both the experience and confidence level of the students.
Chronic kidney disease (CKD) is one of the leading medical ailments in developing countries. Due to the limited healthcare infrastructure and the lack of trained human resources, the CKD problem aggravates if it is not addressed in its earlier stages. In this regard, the role of machine learning-based automated diagnosis systems plays a vital role to deal with the CKD problem. In most of the studies conducted on the automated CKD decision modeling, the main emphasis is given to enhancing the predictive accuracy of the system. In this study, we focus on the applicability challenges of automated decision systems taking CKD diagnosis as a case study within the purview of developing countries. In this regard, we propose a cost-sensitive ensemble feature ranking method that takes a more realistic approach to group-based feature selection. Two candidate solutions are proposed for group-based feature selection to meet different objectives. Subsequently, both the candidate solutions are combined into a consolidated solution. It is pertinent to note that it is one of the first studies in which cost-sensitive ensemble feature ranking for non-overlapping groups is successfully demonstrated to achieve the stated objectives i.e. low-cost and high-accuracy solution. Based on an extensive set of experiments, we demonstrate that a cost-effective and accurate solution for the CKD problem can be obtained. The experimentation includes 7 well-known classification algorithms and 8 comparative feature selection methods to show the efficacy of the proposed approach. It is concluded that the applicability of the automated CKD systems can be enhanced by including the cost consideration into the objective space of the solution formulation. Therefore, a trade-off solution can be obtained that is cost-effective and yet accurate enough to serve as a CKD screening system. INDEX TERMS Ensemble feature ranking, cost-based feature selection, threshold selection, filter methods
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.