Experiments with LDA and Top2Vec for embedded topic discovery on social media data—A case study of cystic fibrosis

Karas, Bradley; Qu, Sue; Xu, Yanji; Zhu, Qian

doi:10.3389/frai.2022.948313

Cited by 8 publications

(4 citation statements)

References 29 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To further contextualize results with a more recent baseline, Top2Vec ( Angelov, 2020 ; Karas et al, 2022 ) was applied to Cp and Cn . Top2Vec uses document and word semantic embeddings ( Gutiérrez & Keith, 2019 ) to automatically identify the salient topics in a group of documents without specifying the number of topics to expect, as is the case with LDA ( Blei, Ng & Jordan, 2003 ).…”

Section: Methodsmentioning

confidence: 99%

Insights into the nutritional prevention of macular degeneration based on a comparative topic modeling approach

Jacaruso

2024

PeerJ Computer Science

View full text Add to dashboard Cite

Topic modeling and text mining are subsets of natural language processing (NLP) with relevance for conducting meta-analysis (MA) and systematic review (SR). For evidence synthesis, the above NLP methods are conventionally used for topic-specific literature searches or extracting values from reports to automate essential phases of SR and MA. Instead, this work proposes a comparative topic modeling approach to analyze reports of contradictory results on the same general research question. Specifically, the objective is to identify topics exhibiting distinct associations with significant results for an outcome of interest by ranking them according to their proportional occurrence in (and consistency of distribution across) reports of significant effects. Macular degeneration (MD) is a disease that affects millions of people annually, causing vision loss. Augmenting evidence synthesis to provide insight into MD prevention is therefore of central interest in this article. The proposed method was tested on broad-scope studies addressing whether supplemental nutritional compounds significantly benefit macular degeneration. Six compounds were identified as having a particular association with reports of significant results for benefiting MD. Four of these were further supported in terms of effectiveness upon conducting a follow-up literature search for validation (omega-3 fatty acids, copper, zeaxanthin, and nitrates). The two not supported by the follow-up literature search (niacin and molybdenum) also had scores in the lowest range under the proposed scoring system. Results therefore suggest that the proposed method’s score for a given topic may be a viable proxy for its degree of association with the outcome of interest, and can be helpful in the systematic search for potentially causal relationships. Further, the compounds identified by the proposed method were not simultaneously captured as salient topics by state-of-the-art topic models that leverage document and word embeddings (Top2Vec) and transformer models (BERTopic). These results underpin the proposed method’s potential to add specificity in understanding effects from broad-scope reports, elucidate topics of interest for future research, and guide evidence synthesis in a scalable way. All of this is accomplished while yielding valuable and actionable insights into the prevention of MD.

show abstract

Section: Methodsmentioning

confidence: 99%

Insights into the nutritional prevention of macular degeneration based on a comparative topic modeling approach

Jacaruso

2024

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…The authors of [12] used BERTopic to developed document embedding with pre-trained transformer-based language models, clustered embeddings, and generated topic representations with the class-based TF-IDF procedure for building neural networks. The authors of [13] implemented the Top2Vec model with doc2vec as the embedding model as their final model to extract topics from a subreddit of CF ("r/CysticFibrosis"). Many studies utilize LDA due to its popularity and simplicity.…”

Section: Topic Modeling For Public Healthmentioning

confidence: 99%

Emotional Health and Climate-Change-Related Stressor Extraction from Social Media: A Case Study Using Hurricane Harvey

Bui,

Hannah,

Madria

et al. 2023

Mathematics

View full text Add to dashboard Cite

Climate change has led to a variety of disasters that have caused damage to infrastructure and the economy with societal impacts to human living. Understanding people’s emotions and stressors during disaster times will enable preparation strategies for mitigating further consequences. In this paper, we mine emotions and stressors encountered by people and shared on Twitter during Hurricane Harvey in 2017 as a showcase. In this work, we acquired a dataset of tweets from Twitter on Hurricane Harvey from 20 August 2017 to 30 August 2017. The dataset consists of around 400,000 tweets and is available on Kaggle. Next, a BERT-based model is employed to predict emotions associated with tweets posted by users. Then, natural language processing (NLP) techniques are utilized on negative-emotion tweets to explore the trends and prevalence of the topics discussed during the disaster event. Using Latent Dirichlet Allocation (LDA) topic modeling, we identified themes, enabling us to manually extract stressors termed as climate-change-related stressors. Results show that 20 climate-change-related stressors were extracted and that emotions peaked during the deadliest phase of the disaster. This indicates that tracking emotions may be a useful approach for studying environmentally determined well-being outcomes in light of understanding climate change impacts.

show abstract

“…It is an unsupervised topic modeling (i.e., clustering documents to topics) (Eykens, Guns, & Vanderstraeten, 2022) which means that it does not require any preset number of clusters. The logic behind this algorithm is described as follows: (1) it takes input texts and converts each of them into a vector in semantic space, (2) once the documents embedded into vectoral space, it finds dense cluster of documents through computing the distance between vectors, (3) identify the words pulled those documents together (Angelov, 2020;Eykens, Guns, & Vanderstraeten;Karas, Qu, Xu, & Zhu, 2022).…”

Section: Instruments and Toolsmentioning

confidence: 99%

“…Due to exponential production of texts, thanks to rapid development of information and networks, clustering and classifying huge volume of text and topics of documents without relying on human resources (i.e., domain specialist) became evident (Chang, Yu, Chang, & Yu, 2021). Because of saving time and human resource, it would be wise to use topic modeling, which is one of the natural language processing methods to identify hidden topics within the documents (Karas, Qu, Xu, & Zhu, 2022).…”

Section: Introductionmentioning

confidence: 99%

Modeling Education Studies Indexed in Web of Science Using Natural Language Processing

Akbay

2022

ITALL

View full text Add to dashboard Cite

Easier access to information and resources allowed researchers to conduct more studies and publish most of them electronically. They are indexed in scholarly citation databases such as Web of Science and Scopus. These databases index huge volumes of research reports. Even though they offer search engine filtering options, it is still hard to locate the publications in which their contents are closely related. Artificial intelligence technologies, such as Natural Language Processing, allow documents to be categorized based on their content. Top2Vec is an unsupervised topic modeling algorithm that enables users to categorize documents semantically. The purpose of the current study is twofold: (1) to provide users with the ability to group documents applying Natural Language Processing techniques, and (2) to reveal the topics with the highest number of articles indexed in the ‘education scientific disciplines’ category within the Web of Science Core Collection scholarly database in 2021. Colab notebook used to type Python codes for executing Top2Vec algorithm. This study yielded 68 distinct topics among the 8125 articles published in 2021 and indexed in the Web of Science database under the Education Scientific Disciplines category. After modeled topics were ranked from the topic having the largest number of documents (i.e., N=549) to the topic having the least number of documents (i.e., N=29), the first eight topics' findings were presented and discussed. These eight most studies topics are listed as follows: Physics (N=549), online education and covid (N=438), Chemistry (N=381), Math and Reasoning (N=377), Psychology and Emotions (N=257), Educational Diversity (N=228), Health and Life (N=223), Mentoring and Leadership (N=204).

show abstract

Experiments with LDA and Top2Vec for embedded topic discovery on social media data—A case study of cystic fibrosis

Cited by 8 publications

References 29 publications

Insights into the nutritional prevention of macular degeneration based on a comparative topic modeling approach

Insights into the nutritional prevention of macular degeneration based on a comparative topic modeling approach

Emotional Health and Climate-Change-Related Stressor Extraction from Social Media: A Case Study Using Hurricane Harvey

Modeling Education Studies Indexed in Web of Science Using Natural Language Processing

Contact Info

Product

Resources

About