Keith Harrigian scite author profile

Keith Harrigian

12Publications

101Citation Statements Received

364Citation Statements Given

How they've been cited

How they cite others

390

359

Affiliations

Johns Hopkins University

Publications

Order By: Most citations

Do Models of Mental Health Based on Social Media Data Generalize?

Harrigian¹,

Aguirre²,

Dredze³

2020

View full text Add to dashboard Cite

Proxy-based methods for annotating mental health status in social media have grown popular in computational research due to their ability to gather large training samples. However, an emerging body of literature has raised new concerns regarding the validity of these types of methods for use in clinical applications. To further understand the robustness of distantly supervised mental health models, we explore the generalization ability of machine learning classifiers trained to detect depression in individuals across multiple social media platforms. Our experiments not only reveal that substantial loss occurs when transferring between platforms, but also that there exist several unreliable confounding factors that may enable researchers to overestimate classification performance. Based on these results, we enumerate recommendations for future mental health dataset construction.

show abstract

Gender and Racial Fairness in Depression Research using Social Media

Aguirre¹,

Harrigian²,

Dredze³

2021

View full text Add to dashboard Cite

Multiple studies have demonstrated that behavior on internet-based social media platforms can be indicative of an individual's mental health status. The widespread availability of such data has spurred interest in mental health research from a computational lens. While previous research has raised concerns about possible biases in models produced from this data, no study has quantified how these biases actually manifest themselves with respect to different demographic groups, such as gender and racial/ethnic groups. Here, we analyze the fairness of depression classifiers trained on Twitter data with respect to gender and racial demographic groups. We find that model performance systematically differs for underrepresented groups and that these discrepancies cannot be fully explained by trivial data representation issues. Our study concludes with recommendations on how to avoid these biases in future research.

show abstract

Geocoding Without Geotags: A Text-based Approach for reddit

Harrigian¹

2018

View full text Add to dashboard Cite

In this paper, we introduce the first geolocation inference approach for reddit, a social media platform where user pseudonymity has thus far made supervised demographic inference difficult to implement and validate. In particular, we design a text-based heuristic schema to generate ground truth location labels for reddit users in the absence of explicitly geotagged data. After evaluating the accuracy of our labeling procedure, we train and test several geolocation inference models across our reddit data set and three benchmark Twitter geolocation data sets. Ultimately, we show that geolocation models trained and applied on the same domain substantially outperform models attempting to transfer training data across domains, even more so on reddit where platformspecific interest-group metadata can be used to improve inferences.

show abstract

Health Disparities in Lapses in Diabetic Retinopathy Care

Cai¹,

Tran²,

Tang³

et al. 2023

Ophthalmology Science

View full text Add to dashboard Cite

On the State of Social Media Data for Mental Health Research

Harrigian¹,

Aguirre²,

Dredze³

2021

View full text Add to dashboard Cite

Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-related challenges have affected research progress. In this paper, we offer an analysis specifically on the state of social media data that exists for conducting mental health research. We do so by introducing an open-source directory of mental health datasets, annotated using a standardized schema to facilitate meta-analysis. 1

show abstract

On the State of Social Media Data for Mental Health Research

Harrigian¹,

Aguirre²,

Dredze³

2020

Preprint

View full text Add to dashboard Cite

Data-driven methods for mental health treatment and surveillance have become a major focus in computational science research in the last decade. However, progress in the domain, in terms of both medical understanding and system performance, remains bounded by the availability of adequate data. Prior systematic reviews have not necessarily made it possible to measure the degree to which data-related challenges have affected research progress. In this paper, we offer an analysis specifically on the state of social media data that exists for conducting mental health research. We do so by introducing an open-source directory of mental health datasets, annotated using a standardized schema to facilitate meta-analysis. 1

show abstract

The Problem of Semantic Shift in Longitudinal Monitoring of Social Media

Harrigian

Dredze

2022

View full text Add to dashboard Cite

Social media allows researchers to track societal and cultural changes over time based on language analysis tools. Many of these tools rely on statistical algorithms which need to be tuned to specific types of language. Recent studies have shown the absence of appropriate tuning, specifically in the presence of semantic shift, can hinder robustness of the underlying methods. However, little is known about the practical effect this sensitivity may have on downstream longitudinal analyses. We explore this gap in the literature through a timely case study: understanding shifts in depression during the course of the COVID-19 pandemic. We find that inclusion of only a small number of semantically-unstable features can promote significant changes in longitudinal estimates of our target outcome. At the same time, we demonstrate that a recently-introduced method for measuring semantic shift may be used to proactively identify failure points of language-based models and, in turn, improve predictive generalization.

show abstract

Geocoding Without Geotags: A Text-based Approach for reddit

Harrigian¹

2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Keith Harrigian

Do Models of Mental Health Based on Social Media Data Generalize?

Gender and Racial Fairness in Depression Research using Social Media

Geocoding Without Geotags: A Text-based Approach for reddit

Health Disparities in Lapses in Diabetic Retinopathy Care

On the State of Social Media Data for Mental Health Research

On the State of Social Media Data for Mental Health Research

The Problem of Semantic Shift in Longitudinal Monitoring of Social Media

Geocoding Without Geotags: A Text-based Approach for reddit

Contact Info

Product

Resources

About