BackgroundPatients of certain diseases are less likely to approach the healthcare system but remain active in social media. Young Social Anxiety Disorder (SAD) patients, in particular, are a hard-to-reach population due to disease symptomatology, unmet need and age-related barriers, which makes obtaining first-hand access to patient perspectives challenging.ObjectiveTo create a curated cohort of patients from social media that report their age in the range of 13 to 25 years old and confirm having a SAD diagnosis or having received therapy for SAD, and to assess the value of the content posted by these users for observational studies of SAD.MethodsWe collected 535k posts by 118k Reddit users from the r/SocialAnxiety subreddit. We then developed precise regular expressions to extract age, diagnosis and therapy mentions. We manually annotated the full set of expressions extracted and double-annotated 5% of the age mentions and 10% of the diagnosis and therapy mentions. Using similar methodology, we identified mentions of comorbidities and substance use.ResultsOur validated cohort includes 37,073 posts by 1,102 users that meet the inclusion criteria. The age, diagnosis, and therapy mention detection had a precision of 68%, 31%, and 44%, respectively, with an inter-annotator agreement of 0.96, 0.96, and 0.78. Sixty-one percent of the users in the cohort report having one or more comorbidities on top of their SAD diagnosis (Fleiss’s Kappa=0.79) and 13% report a concerning use of drugs or alcohol (Fleiss’s Kappa=0.87). We compared the characteristics of our social media cohort to the published literature on SAD.ConclusionsPatients with SAD post actively on Reddit and their perspectives can be captured and studied directly from these data. Extracting age, therapy, substance abuse and comorbidities (and potentially other patient data) can address realworld data source biases. Thus, social media is a valuable source to create cohorts of hard-to-reach patient populations that may not enter the healthcare system.