It is widely considered that approximately 10% of the population suffers from type 2 diabetes. Unfortunately, the impact of this disease is underestimated. Patient's mortality often occurs due to complications caused by the disease and not the disease itself. Many techniques utilized in modeling diseases are often in the form of a “black box” where the internal workings and complexities are extremely difficult to understand, both from practitioners' and patients' perspective. In this work, we address this issue and present an informative model/pattern, known as a “latent phenotype,” with an aim to capture the complexities of the associated complications' over time. We further extend this idea by using a combination of temporal association rule mining and unsupervised learning in order to find explainable subgroups of patients with more personalized prediction. Our extensive findings show how uncovering the latent phenotype aids in distinguishing the disparities among subgroups of patients based on their complications patterns. We gain insight into how best to enhance the prediction performance and reduce bias in the models applied using uncertainty in the patients' data.
This research aims to explore how to enhance student engagement in higher education institutions (HEIs) while using a novel conversational system (chatbots). The principal research methodology for this study is design science research (DSR), which is executed in three iterations: personas elicitation, a survey and development of student engagement factor models (SEFMs), and chatbot interaction analysis. This paper focuses on the first iteration, personas elicitation, which proposes a data-driven persona development method (DDPDM) that utilises machine learning, specifically the K-means clustering technique. Data analysis is conducted using two datasets. Three methods are used to find the K-values: the elbow, gap statistic, and silhouette methods. Subsequently, the silhouette coefficient is used to find the optimal value of K. Eight personas are produced from the two data analyses. The pragmatic findings from this study make two contributions to the current literature. Firstly, the proposed DDPDM uses machine learning, specifically K-means clustering, to build data-driven personas. Secondly, the persona template is designed for university students, which supports the construction of data-driven personas. Future work will cover the second and third iterations. It will cover building SEFMs, building tailored interaction models for these personas and then evaluating them using chatbot technology.
Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method : We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical and 16 related to confusion matrix inconsistency (one paper contained both classes of error). Conclusions: Whilst some errors may be of a relatively trivial nature, e.g., transcription errors their presence does not engender confidence. We strongly urge researchers to follow open science principles so errors can be more easily be detected and corrected, thus as a community reduce this worryingly high error rate with our computational experiments.
Clinicians predict disease and related complications based on prior knowledge and each individual patient's clinical history. The prediction process is complex due to the existence of unmeasured risk factors, the unexpected development of complications and varying responses of patients to disease over time. Exploiting these unmeasured risk factors (hidden variables) can improve the modeling of disease progression and thus enables clinicians to focus on early diagnosis and treatment of unexpected conditions. However, the overuse of hidden variables can lead to complex models that can overfit and are not well understood (being 'black box' in nature). Identifying and understanding groups of patients with similar disease profiles (based on discovered hidden variables) makes it possible to better understand disease progression in different patients while improving prediction. We explore the use of a stepwise method for incrementally identifying hidden variables based on the Induction Causation (IC*) algorithm. We exploit Dynamic Time Warping and hierarchical clustering to cluster patients based upon these hidden variables to uncover their meaning with respect to the complications of Type 2 Diabetes Mellitus patients. Our results reveal that inferring a small number of targeted hidden variables and using them to cluster patients not only leads to an improvement in the prediction accuracy but also assists the explanation of different discovered sub-groups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.