Feature selection aims to eliminate redundant or irrelevant variables from input data to reduce computational cost, provide a better understanding of data and improve prediction accuracy. Majority of the existing filter methods utilize a single feature-ranking technique, which may overlook some important assumptions about the underlying regression function linking input variables with the output. In this paper, we propose a novel feature selection framework that combines clustering of variables with multiple featureranking techniques for selecting an optimal feature subset. Different feature-ranking methods typically result in selecting different subsets, as each method has its own assumption about the regression function linking input variables with the output. Therefore, we employ multiple feature-ranking methods having disjoint assumption about the regression function. The proposed approach has a feature ranking module to identify relevant features and a clustering module to eliminate redundant features. First, input variables are ranked using regression coefficients obtained by training L1 regularized Logistic Regression, Support Vector Machine and Random Forests models. Those features which are ranked lower than a certain threshold are filtered-out. The remaining features are grouped into clusters using an exemplar-based clustering algorithm, which identifies data-points that exemplify the data better, and associates each data-point with an exemplar. We use both linear correlation coefficients and information gain for measuring the association between a data-point and its corresponding exemplar. From each cluster the highest ranked feature is selected as a delegate, and all delegates from the three ranked lists are combined into the final feature set using union operation. Empirical results over a number of real-world data sets confirm the hypothesis that combining features selected using multiple heterogeneous methods results in a more robust feature set and improves prediction accuracy. As compared to other feature selection approaches evaluated, features selected using linear correlation-based multi-filter feature selection achieved the best classification accuracy with 98.7%, 100%, 92.3% and 100% for Ionosphere, Wisconsin Breast Cancer, Sonar and Wine data sets respectively.
Purpose-E-learning Environments and Services (ELES) adoption and success rates challenge ELES designers, practitioners and organisations. Enterprise decision makers continue to seek effective instruments in launching such systems. This study aims to understand users' perceptions of ELES effectiveness and develops a theoretical framework which improves understanding of success factors for adoption. Design/Methodology/Approach-Grounded Theory Method (GTM) is used to reflect on the relationships between changing users' requirements and expectations, technological advances and ELES effectiveness models. A longitudinal study collecting data from social media blogs over four years was authenticated based on the context evaluation, language structure and conversational constructs. Findings-Identification of a new core dimension named "Concept Functionality" which can be used to understand the relationships between E-learning effectiveness factors, including the relationships with other domains such as security. The findings are also used to validate major existing models for the success of ELES. Practical Implications-The new framework potentially improves system design process in the fields of education technology, enterprise systems, etc. Originality/Value-Concept functionality dimension can offer more insights to understand ELES effectiveness and further improve system design process in a variety of domains including enterprise systems, process modelling and education technology.
Handwritten character recognition has been profoundly studied for many years in the field of pattern recognition. Due to its vast practical applications and financial implications, the handwritten character recognition is still an important research area. In this research, a Handwritten Ethiopian Character Recognition (HECR) dataset is prepared to train a model. Images in the HECR dataset were organized with more than one color pen RGB main spaces that are size normalized to 28 × 28 pixels. The dataset is a combination of scripts (Fidel in Ethiopia), numerical representations, punctuations, tonal symbols, combining symbols, and special characters. These scripts have been used to write ancient histories, science, and arts of Ethiopia and Eritrea. In this study, a hybrid model of two super classifiers: Convolutional Neural Network (CNN), as well as eXtreme Gradient Boosting (XGBoost), are proposed for classification. In this integrated model, CNN works as a trainable automatic feature extractor from the raw images and XGBoost takes the extracted features as an input for recognition and classification. The output error rates of the hybrid model and CNN with a fully connected layer are compared. A 0.4630 and 0.1612 error rates were achieved in classifying the handwritten testing dataset images, respectively. The XGBoost as a classifier gave better results than the traditional fully connected layer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.