2022
DOI: 10.3389/frai.2022.948313
|View full text |Cite
|
Sign up to set email alerts
|

Experiments with LDA and Top2Vec for embedded topic discovery on social media data—A case study of cystic fibrosis

Abstract: Social media has become an important resource for discussing, sharing, and seeking information pertinent to rare diseases by patients and their families, given the low prevalence in the extraordinarily sparse populations. In our previous study, we identified prevalent topics from Reddit via topic modeling for cystic fibrosis (CF). While we were able to derive/access concerns/needs/questions of patients with CF, we observed challenges and issues with the traditional techniques of topic modeling, e.g., Latent Di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 29 publications
(35 reference statements)
0
4
0
Order By: Relevance
“…To further contextualize results with a more recent baseline, Top2Vec ( Angelov, 2020 ; Karas et al, 2022 ) was applied to Cp and Cn . Top2Vec uses document and word semantic embeddings ( Gutiérrez & Keith, 2019 ) to automatically identify the salient topics in a group of documents without specifying the number of topics to expect, as is the case with LDA ( Blei, Ng & Jordan, 2003 ).…”
Section: Methodsmentioning
confidence: 99%
“…To further contextualize results with a more recent baseline, Top2Vec ( Angelov, 2020 ; Karas et al, 2022 ) was applied to Cp and Cn . Top2Vec uses document and word semantic embeddings ( Gutiérrez & Keith, 2019 ) to automatically identify the salient topics in a group of documents without specifying the number of topics to expect, as is the case with LDA ( Blei, Ng & Jordan, 2003 ).…”
Section: Methodsmentioning
confidence: 99%
“…The authors of [12] used BERTopic to developed document embedding with pre-trained transformer-based language models, clustered embeddings, and generated topic representations with the class-based TF-IDF procedure for building neural networks. The authors of [13] implemented the Top2Vec model with doc2vec as the embedding model as their final model to extract topics from a subreddit of CF ("r/CysticFibrosis"). Many studies utilize LDA due to its popularity and simplicity.…”
Section: Topic Modeling For Public Healthmentioning
confidence: 99%
“…It is an unsupervised topic modeling (i.e., clustering documents to topics) (Eykens, Guns, & Vanderstraeten, 2022) which means that it does not require any preset number of clusters. The logic behind this algorithm is described as follows: (1) it takes input texts and converts each of them into a vector in semantic space, (2) once the documents embedded into vectoral space, it finds dense cluster of documents through computing the distance between vectors, (3) identify the words pulled those documents together (Angelov, 2020;Eykens, Guns, & Vanderstraeten;Karas, Qu, Xu, & Zhu, 2022).…”
Section: Instruments and Toolsmentioning
confidence: 99%
“…Due to exponential production of texts, thanks to rapid development of information and networks, clustering and classifying huge volume of text and topics of documents without relying on human resources (i.e., domain specialist) became evident (Chang, Yu, Chang, & Yu, 2021). Because of saving time and human resource, it would be wise to use topic modeling, which is one of the natural language processing methods to identify hidden topics within the documents (Karas, Qu, Xu, & Zhu, 2022).…”
Section: Introductionmentioning
confidence: 99%