2022
DOI: 10.2196/35446
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Machine Learning to Detect and Characterize Barriers to Pre-exposure Prophylaxis Therapy: Multiplatform Social Media Study

Abstract: Background: Among racial and ethnic minority groups, the risk of HIV infection is an ongoing public health challenge. Pre-exposure prophylaxis (PrEP) is highly effective for preventing HIV when taken as prescribed. However, there is a need to understand the experiences, attitudes, and barriers of PrEP for racial and ethnic minority populations and sexual minority groups.Objective: This infodemiology study aimed to leverage big data and unsupervised machine learning to identify, characterize, and elucidate expe… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 39 publications
0
10
0
Order By: Relevance
“…Exceptions to this were two studies by Young et al [ 21 , 22 •] that drew exclusively on Facebook data collected with consent from a cohort of sexual minority men, and one study each that leveraged data from Reddit [ 23 ] and Baidu Tieba [ 24 ], a Chinese social media platform. Three more studies incorporated data from multiple platforms [ 25 27 ] (i.e., various combinations of data from Twitter, Reddit, Instagram, YouTube, and Tumblr). Irrespective of the platform, 20 of the 21 social media studies analyzed post content.…”
Section: Resultsmentioning
confidence: 99%
“…Exceptions to this were two studies by Young et al [ 21 , 22 •] that drew exclusively on Facebook data collected with consent from a cohort of sexual minority men, and one study each that leveraged data from Reddit [ 23 ] and Baidu Tieba [ 24 ], a Chinese social media platform. Three more studies incorporated data from multiple platforms [ 25 27 ] (i.e., various combinations of data from Twitter, Reddit, Instagram, YouTube, and Tumblr). Irrespective of the platform, 20 of the 21 social media studies analyzed post content.…”
Section: Resultsmentioning
confidence: 99%
“…We set a total number of 20 different clusters (ie, total number of topics for BTM to output: k=20), resulting in texts with similar themes put into the same clusters. To find the appropriate k value, we used a topic coherence score [ 21 , 24 ]. Coherence score is used to measure the performance of a topic model with different number of clusters and can help differentiate between topics that are semantically interpretable and topics that are artifacts of statistical inference [ 24 , 25 ].…”
Section: Methodsmentioning
confidence: 99%
“…To find the appropriate k value, we used a topic coherence score [ 21 , 24 ]. Coherence score is used to measure the performance of a topic model with different number of clusters and can help differentiate between topics that are semantically interpretable and topics that are artifacts of statistical inference [ 24 , 25 ]. We tested 5 different k values (k=10, 20, 30, 40, and 50) for each data set and found that when k=20, we generated the highest coherence score, and this score did not change significantly with an increase in the k value.…”
Section: Methodsmentioning
confidence: 99%
“…BTM can be used to sort short text into highly prevalent themes without the need for predetermined training data and has been previously used for exploration of other public health topics [ 19 24 ]. The methodological approach of using BTM for detection of HIV and PrEP-related topics is also detailed in a separate published paper [ 25 ].…”
Section: Methodsmentioning
confidence: 99%
“…A general deductive coding schema using the socio-ecological perspective outline (SEPO) [ 26 ] that outlines three intervention levels for PrEP including the “Individual and Relationships Domains: Provider Level”, “Individual and Relationships Domains: Patient Level” and “Community Domains: Healthcare-System Level”, were selected as parent codes as used in a prior PrEP infodemiology study [ 25 ]. All posts were reviewed by first and second author, and notes were taken on general themes of posts from which an initial code list was created.…”
Section: Methodsmentioning
confidence: 99%