2021
DOI: 10.2196/25807
|View full text |Cite|
|
Sign up to set email alerts
|

Predicting Age Groups of Reddit Users Based on Posting Behavior and Metadata: Classification Model Development and Validation

Abstract: Background Social media are important for monitoring perceptions of public health issues and for educating target audiences about health; however, limited information about the demographics of social media users makes it challenging to identify conversations among target audiences and limits how well social media can be used for public health surveillance and education outreach efforts. Certain social media platforms provide demographic information on followers of a user account, if given, but they… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 37 publications
0
9
0
Order By: Relevance
“…A previously developed age prediction model was used to predict the age group for each author as either UA (13–20 years), OLA (21–54 years), or uncertain. 11 These categories were used to examine the differences in conversations depending on whether the user was OLA to use tobacco. The lower bound was selected because Reddit users must be aged ≥13 years, and those aged >54 years could not be appropriately classified because of the small number of individuals who fell into this category during the development of the model.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…A previously developed age prediction model was used to predict the age group for each author as either UA (13–20 years), OLA (21–54 years), or uncertain. 11 These categories were used to examine the differences in conversations depending on whether the user was OLA to use tobacco. The lower bound was selected because Reddit users must be aged ≥13 years, and those aged >54 years could not be appropriately classified because of the small number of individuals who fell into this category during the development of the model.…”
Section: Methodsmentioning
confidence: 99%
“…A full list of the variables used in the model, as well as further background on other variables considered, variable importance, and model performance, can be found in Chew et al. 11 Because the model does not produce perfect predictions (test set F1 score, ∼0.79), we reduced the likelihood that the model returned false positives by only considering predictions with a predicted probability >0.6 for either age group. This process of rejecting predictions for which the model is most uncertain is referred to as classification with a reject option 13 in the literature.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…A byproduct of this process is an increasing number of identity claims, in which a person shares a specific facet of their identity with a new audience. Social media sites are scattered with identity claims, as people seek to provide information about their identity as context; in lieu of traditional reporting mechanisms like surveys, identity claims found using pattern matching have been used to label users with demographic information such as gender and age [1][2][3] and mental health status [4][5][6].…”
Section: Introductionmentioning
confidence: 99%