Higher education institutions aim to forecast student success which is an important research subject. Forecasting student success can enable teachers to prevent students from dropping out before final examinations, identify those who need additional help and boost institution ranking and prestige. Machine learning techniques in educational data mining aim to develop a model for discovering meaningful hidden patterns and exploring useful information from educational settings. The key traditional characteristics of students (demographic, academic background and behavioural features) are the main essential factors that can represent the training dataset for supervised machine learning algorithms. In this study, we compared the performances of several supervised machine learning algorithms, such as Decision Tree, Naïve Bayes, Logistic Regression, Support Vector Machine, K-Nearest Neighbour, Sequential Minimal Optimisation and Neural Network. We trained a model by using datasets provided by courses in the bachelor study programmes of the College of Computer Science and Information Technology, University of Basra, for academic years 2017–2018 and 2018–2019 to predict student performance on final examinations. Results indicated that logistic regression classifier is the most accurate in predicting the exact final grades of students (68.7% for passed and 88.8% for failed).
Clinical decisions are crucial because they are related to human lives. Thus, managers and decision makers inthe clinical environment seek new solutions that can support their decisions. A clinical data warehouse (CDW) is animportant solution that is used to achieve clinical stakeholders’ goals by merging heterogeneous data sources in a centralrepository and using this repository to find answers related to the strategic clinical domain, thereby supporting clinicaldecisions. CDW implementation faces numerous obstacles, starting with the data sources and ending with the tools thatview the clinical information. This paper presents a systematic overview of purpose of CDWs as well as the characteristics;requirements; data sources; extract, transform and load (ETL) process; security and privacy concerns; design approach;architecture; and challenges and difficulties related to implementing a successful CDW. PubMed and Google Scholarare used to find papers related to CDW. Among the total of 784 papers, only 42 are included in the literature review. Thesepapers are classified based on five perspectives, namely methodology, data, system, ETL tool and purpose, to findinsights related to aspects of CDW. This review can contribute answers to questions related to CDW and providerecommendations for implementing a successful CDW.
Clinical decisions are crucial because they are related to human lives. Thus, managers and decision makers in the clinical environment seek new solutions that can support their decisions. A clinical data warehouse (CDW) is an important solution that is used to achieve clinical stakeholders' goals by merging heterogeneous data sources in a central repository and using this repository to find answers related to the strategic clinical domain, thereby supporting clinical decisions. CDW implementation faces numerous obstacles, starting with the data sources and ending with the tools that view the clinical information. This paper presents a systematic overview of purpose of CDWs as well as the characteristics; requirements; data sources; extract, transform and load (ETL) process; security and privacy concerns; design approach; architecture; and challenges and difficulties related to implementing a successful CDW. PubMed and Google Scholar are used to find papers related to CDW. Among the total of 784 papers, only 42 are included in the literature review. These papers are classified based on five perspectives, namely methodology, data, system, ETL tool and purpose, to find insights related to aspects of CDW. This review can contribute answers to questions related to CDW and provide recommendations for implementing a successful CDW.
In the wake of the outbreak of the new coronavirus, the countries in the world have fought to combat the spread of infection and imposed preventive measures to compel the population to social distancing, which led to a global crisis. Important strategies must be studied and identified to prevent and control the spread of coronavirus COVID-19 disease 2019. In this paper, the effect of preventive strategies on COVID-19 spread was studied, a model based on supervised data mining algorithms was presented and the best algorithm was suggested on the basis of accuracy. In this model, three classifiers (Naive Bayes, Multilayer Perceptron and J48) depended on the questionnaires filled out by Basra City respondents. The questionnaires consisted of 25 questions that covered fields most related to and that affect the prevention of COVID-19 spread, including demographic, psychological, health management, cognitive, awareness and preventive factors. A total of 1017 respondents were collected. This model was developed using Weka 3.8 tool. Results showed that quarantine played an important role in controlling the spread of the disease. By comparing the accuracy of the algorithms used, the best algorithm was found to be J48.
Academic institutions always try to use a solid platform for supporting their short-to-long term decisions related to academic performance. These platforms utilize historical data and turn them into strategic decisions. The hidden patterns in the data need tools and approaches to be discovered. This paper aims to present a short roadmap for implementing educational data mart based on a data set from Alexandria Private Elementary School, located in the Basrah province of Iraq in the 2017-2018 academic year. The educational data mart is implemented, then the cube is constructed to perform OLAP operations and present OLAP reports. Next, OLAP mining is performed on the educational cube using nine algorithms, namely: decision tree with score method (entropy) and split method (complete)), decision tree with score method (entropy) and split method (complete)), decision tree with score method (entropy) and split method (both)), Logistic, Naïve Bayes, Neural Network, clustering with expectation maximization, clustering with K-means clustering, and association rules mining. According to a comparison of all algorithms, clustering with expectation-maximization proved the highest accuracy with 96.76% for predicting the students' performance and 96.12% for predicting students' grades amongst all other algorithms.
Prediction in data mining is a sophisticated task that is conducted in various disciplines. Given that the overall success of educational institutions can be measured by their students' success, many studies are dedicated to predicting it. This paper provides a model of student's success prediction based on Bayes algorithms and suggests the best algorithm based on performance details. Two built Bayes Algorithms (naïve Bayes and Bayes network) were used in this model with students' questionnaire answers. The questionnaire consists of 62 questions that cover the fields affecting students' performance the most. The questions refer to health, social activity, relationships and academic performance. The questionnaire is constructed based on a Google form and open-source applications (LimeSurvey); the total number of student answers is 161. To build this model, the tool Weka 3.8 is used. The overall model design process can be divided into two stages. The first stage is finding the most correlated questions to the final class, and the second is applying algorithms and finding the optimal algorithm. A comparison is made between these two Bayes algorithms based on performance details. Finally, the naïve Bayes algorithm is selected as an optimal choice for students' success prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.