Data models built for analyzing student data often obfuscate temporal relationships for reasons of simplicity, or to aid in generalization. We present a model based on temporal relationships of heterogeneous data as the basis for building predictive models. We show how within- and between-semester temporal patterns can provide insight into the student experience. For example, in a within-semester model, the prediction of the final course grade can be based on weekly activities and submissions recorded in the LMS. In the between-semester model, the prediction of success or failure in a degree program can be based on sequence patterns of grades and activities across multiple semesters. The benefits of our sequence data model include temporal structure, segmentation, contextualization, and storytelling. To demonstrate these benefits, we have collected and analyzed 10 years of student data from the College of Computing at UNC Charlotte in a between-semester sequence model, and used data in an introductory course in computer science to build a within-semester sequence model. Our results for the two sequence models show that analytics based on the sequence data model can achieve higher predictive accuracy than non-temporal models with the same data.
BackgroundSociety always has limited resources to expend on health care, or anything else. What are the unmet medical needs? How do we allocate limited resources to maximize the health and welfare of the people? These challenging questions might be re-examined systematically within an infodemiological frame on a much larger scale, leveraging the latest advancement in information technology and data science.ObjectiveWe expanded our previous work by investigating news media data to reveal the coverage of different diseases and medical conditions, together with their sentiments and topics in news articles over three decades. We were motivated to do so since news media plays a significant role in politics and affects the public policy making.MethodsWe analyzed over 3.5 million archive news articles from Reuters media during the periods of 1996/1997, 2008 and 2016, using summary statistics, sentiment analysis, and topic modeling. Summary statistics illustrated the coverage of various diseases and medical conditions during the last 3 decades. Sentiment analysis and topic modeling helped us automatically detect the sentiments of news articles (ie, positive versus negative) and topics (ie, a series of keywords) associated with each disease over time.ResultsThe percentages of news articles mentioning diseases and medical conditions were 0.44%, 0.57% and 0.81% in the three time periods, suggesting that news media or the public has gradually increased its interests in medicine since 1996. Certain diseases such as other malignant neoplasm (34%), other infectious diseases (20%), and influenza (11%) represented the most covered diseases. Two hundred and twenty-six diseases and medical conditions (97.8%) were found to have neutral or negative sentiments in the news articles. Using topic modeling, we identified meaningful topics on these diseases and medical conditions. For instance, the smoking theme appeared in the news articles on other malignant neoplasm only during 1996/1997. The topic phrases HIV and Zika virus were linked to other infectious diseases during 1996/1997 and 2016, respectively.ConclusionsThe multi-dimensional analysis of news media data allows the discovery of focus, sentiments and topics of news media in terms of diseases and medical conditions. These infodemiological discoveries could shed light on unmet medical needs and research priorities for future and provide guidance for the decision making in public policy.
Binge drinking is a severe health problem faced by many US colleges and universities. College students often post drinking-related text and images on social media, portraying their alcohol use as socially desirable. In this project, we investigated the feasibility of mining the heterogeneous data (e.g. text, images, and videos) on Facebook to identify drinking-related contents. We manually annotated 4266 posts during 21 October 2011 and 3 November 2014 from "I'm Shmacked" group on Facebook, where 511 posts were drinking-related. Our machine learning models show that by combining heterogeneous data types, we were able to identify drinking-related posts with an F1-score of 0.81. Prediction models built on text data were more reliable compared to those built on image and video data for predicting drinking-related contents. As the first step of our efforts in this direction, this feasibility study showed promise toward unleashing the potential of mining social media to identify students who binge drink.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.