The analysis of a large amount of data with high dimensionality of rows and columns increases the load of machine learning algorithms. Such data are likely to have noise and consequently, obstruct the performance of machine learning algorithms. Feature selection (FS) is one of the most essential machine learning techniques that can solve the above-mentioned problem. It tries to identify and eliminate irrelevant information as much as possible and only maintain a minimum subset of appropriate features. It plays an important role in improving the accuracy of machine-learning algorithms. It also reduces computational complexity, run time, storage, and cost. In this paper, a new feature selection algorithm based on feature stability and correlation is proposed to select the effective minimum subset of appropriate features. The efficiency of the proposed algorithm was evaluated by comparing it with other state-of-the-art dimensionality reduction (DR) algorithms using benchmark datasets. The evaluation criteria included the size of the minimum subset, the classification accuracy, the F-measure, and the area under curve (AUC). The results showed that the proposed algorithm is the pioneer in reducing a given dataset with high predictive accuracy.
Literature is focusing on identifying factors that influence students' initial choice of major and few have studied students' involvements after registration in a selected major and this study is one of the few. This study aims to determine the important factors that influence high school students' choice of major based on data mining techniques. A questionnaire was designed to collect data from students in different universities in Kuwait and in different faculties such as science, literature, medicine and engineering. Rough set theory for feature selection was used to highlight and explain the significant factors related to students' skills and preferences awareness as well as their experience reflection that are responsible for the development of their satisfaction with the choice of their university majors. The findings of the study revealed that the calculated reducts have a significant influence on the students' choice of the university and collage major. This research contributes to literature by identifying the relationship between the conditional factors of the reduct (also known as the independent variables) and the classification attribute (also known as the dependent variable). The results of the study give valuable information to the high school students so they know the best majors which suite their skills, preference and experiences. This research also help students not to continually change their major because of the wrong choice of major they made which accordingly lead them to dissatisfaction of their major.
This research proposed an approach that is intended to determine the minimal set of important factors that influence the desire of learning Korean language in the Gulf Cooperation Council (GCC). Those factors will then influence marketing of the Korean language in GCC by guiding interested people to increase their commercial abilities, improve their information about Korean drama, and prepare them to study or travel to Korea. A total of 500 responses out of 526 questionnaires were used for the analysis process. Merging the weight by SVM and the weight guided feature selection (WGFS) techniques were proposed to build a strong hybrid model of reduction for the investigated dataset. Five different classifiers were used to test the results. Empirical results have showed that the generated factors (the reduct) are very significant to test the ability/inability of learning the Korean language. SVM was shown as the best with accuracy value of 94%. This research contributed to the literature by highlighting the importance of the Korean language in the GCC and by presenting the important factors that influence learners of the Korean language: encouragements and obstacles. Moreover, current research presented the best classifier which yields to the high performance of classification.
This study applied data mining process to provide an insight about factors which led to the adoption of online shopping in medium and Large-sized shopping centers (MLShCs) in Kuwait. This research is focused on proposing the high quality environmental factors which affect the success of online shopping among (MLShCs) in Kuwait and ignoring the low impact factors that may cost a lot of money. It also researches the behavior of the customers and their desire to buy directly from the physical store, the online shopping website, or both. This was achieved by distributing a questionnaire, collecting the data in dataset, cleaning the data, minimizing the dimension of the dataset vertically by applying the rough set feature selection technique, and building a classification model. The result of the previous process is the key to examine the success of online shopping in MLShCs. This study could work as the decision maker for those new investors in Kuwait who are thinking of establishing new shopping business and advise them to go for physical store, online shopping website, or both. It also advices current online shopping business to improve their infrastructure, website's design and availability, and security issues. The proposed approach performs effectively and generates interested results.
Data mining is the process of discovering or extracting information from large amount of data that are stored in databases or datasets such as phishing dataset. Phishing is a vital web security problem that involves simulating legitimate websites to mislead online users in order to steal their sensitive information. This paper aims to detect and predict the type of the website to either legitimate or phishing class label. It investigates different data mining classifiers that are applied to the phishing dataset aiming to determine the effective ones in terms of classification performance. The comparison between nine classifiers with help of rapid miner software was conducted. Here, for comparing the result, five different metrics were used including accuracy, precision, recall, sensitivity and F-Measure. In this study, it has been able to identify the classifiers that precisely recognize fake websites especially with respect to the evolutionary nature of the information attacks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.