Social media is now being used to model mental well-being, and for understanding health outcomes. Computer scientists are now using quantitative techniques to predict the presence of specific mental disorders and symptomatology, such as depression, suicidality, and anxiety. This research promises great benefits to monitoring efforts, diagnostics, and intervention design for these mental health statuses. Yet, there is no standardized process for evaluating the validity of this research and the methods adopted in the design of these studies. We conduct a systematic literature review of the state-of-the-art in predicting mental health status using social media data, focusing on characteristics of the study design, methods, and research design. We find 75 studies in this area published between 2013 and 2018. Our results outline the methods of data annotation for mental health status, data collection and quality management, pre-processing and feature selection, and model selection and verification. Despite growing interest in this field, we identify concerning trends around construct validity, and a lack of reflection in the methods used to operationalize and identify mental health status. We provide some recommendations to address these challenges, including a list of proposed reporting standards for publications and collaboration opportunities in this interdisciplinary space.
Social media sites are challenged by both the scale and variety of deviant behavior online. While algorithms can detect spam and obscenity, behaviors that break community guidelines on some sites are difficult because they have multimodal subtleties (images and/or text). Identifying these posts is often regulated to a few moderators. In this paper, we develop a deep learning classifier that jointly models textual and visual characteristics of pro-eating disorder content that violates community guidelines. Using a million Tumblr photo posts, our classifier discovers deviant content efficiently while also maintaining high recall (85%). Our approach uses human sensitivity throughout to guide the creation, curation, and understanding of this approach to challenging, deviant content. We discuss how automation might impact community moderation, and the ethical and social obligations of this area.
Online communities can promote illness recovery and improve well-being in the cases of many kinds of illnesses. However, for challenging mental health condition like anorexia, social media harbor both recovery communities as well as those that encourage dangerous behaviors. The effectiveness of such platforms in promoting recovery despite housing both communities is underexplored. Our work begins to fill this gap by developing a statistical framework using survival analysis and situating our results within the cognitive behavioral theory of anorexia. This model identifies content and participation measures that predict the likelihood of recovery. From our dataset of over 68M posts and 10K users that self-identify with anorexia, we find that recovery on Tumblr is protracted - only half of the population is estimated to exhibit signs of recovery after four years. We discuss the effectiveness of social media in improving well-being around anorexia, a unique health challenge, and emergent questions from this line of work.
Human-centered machine learning" (HCML) combines human insights and domain expertise with datadriven predictions to answer societal questions. This area's inherent interdisciplinarity causes tensions in the obligations researchers have to the humans whose data they use. This paper studies how scientific papers represent human research subjects in HCML. Using mental health status prediction on social media as a case study, we conduct thematic discourse analysis on 55 papers to examine these representations. We identify five discourses that weave a complex narrative of who the human subject is in this research: Disorder/Patient, Social Media, Scientific, Data/Machine Learning, and Person. We show how these five discourses create paradoxical subject and object representations of the human, which may inadvertently risk dehumanization. We also discuss the tensions and impacts of interdisciplinary research; the risks of this work to scientific rigor, online communities, and mental health; and guidelines for stronger HCML research in this nascent area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.