The analysis of medical data is a challenging task for health care systems since a huge amount of interesting knowledge can be automatically mined to effectively support both physicians and health care organizations. This paper proposes a data analysis framework based on a multiple-level clustering technique to identify the examination pathways commonly followed by patients with a given disease. This knowledge can support health care organizations in evaluating the medical treatments usually adopted, and thus the incurred costs. The proposed multiple-level strategy allows clustering patient examination datasets with a variable distribution. To measure the relevance of specific examinations for a given disease complication, patient examination data has been represented in the Vector Space Model using the TF-IDF method. As a case study, the proposed approach has been applied to the diabetic care scenario. The experimental validation, performed on a real collection of diabetic patients, demonstrates the effectiveness of the approach in identifying groups of patients with a similar examination history and increasing severity in diabetes complications.
Graph-based summarization entails extracting a worthwhile subset of sentences from a collection of textual documents by using a graph-based model to represent the correlations between pairs of document terms. However, since the high-order correlations among multiple terms are disregarded during graph evaluation, the summarization performance could be limited unless integrating ad-hoc language-dependent or semantics-based analysis. This paper presents a novel and general-purpose graph-based summarizer, namely GraphSum (Graph-based Summarizer). It discovers and exploits association rules to represent the correlations among multiple terms that have been neglected by previous approaches. The graph nodes, which represent combinations of two or more terms, are first ranked by means of a PageRank
The analysis of data generated by higher educational institutes has the potential of revealing interesting facets of student learning behavior. Classification is a popularly explored area in Educational Data Mining for predicting student performance. Using student behavioral data, this study compares the performance of a broad range of classification techniques to find a qualitative model for the prediction of student performance. Rebalancing of data has also been explored to verify if it leads to the creation of better classification models. The experimental results, validated using well-established evaluation matrices, presented potentially significant outcomes which may be used for reshaping the learning paradigm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.