J Med Internet Res

2021

DOI: 10.2196/26310

|View full text |Cite

|

Sign up to set email alerts

|

Cancer Communication and User Engagement on Chinese Social Media: Content Analysis and Topic Modeling Study

Pianpian Wang²,

et al.

Abstract: Background Cancer ranks among the most serious public health challenges worldwide. In China—the world’s most populous country—about one-quarter of the population consists of people with cancer. Social media has become an important platform that the Chinese public uses to express opinions. Objective We investigated cancer-related discussions on the Chinese social media platform Weibo (Sina Corporation) to identify cancer topics that generate the highest … Show more

Help me understand this report

View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...

Methods5

Citation Types

Supporting

0

Mentioning

8

Contrasting

0

Year Published

2022

2022

2024

2024

Publication Types

Select...

Article6

Relationship

Self Cite0

Independent6

Authors

Journals

Cited by 16 publications

(10 citation statements)

References 29 publications

(31 reference statements)

Supporting

0

Mentioning

8

Contrasting

0

Order By: Relevance

“…Before the analysis, we followed the standard preprocessing procedures designed in previous studies [ 37 , 38 ] to clean the data using Python 3.0 (Python Software Foundation) and to perform word part-of-speech tagging and text processing using the Python library spaCy [ 39 , 40 ]. Through data cleaning, we converted the words in the reviews into lowercase words; removed stop words, punctuation, numbers, and nonword characters; and stemmed the remaining text [ 41 ]. To generate more interpretable topics of high quality, we restricted the parts of speech of words to “noun” (NOUN), “verb” (VERB), “adjective” (ADJ), or “proper noun” (PROPN).…”

Section: Methodsmentioning

confidence: 99%

“…The statistical methods of unsupervised TM algorithms (which do not need prior labeling or annotations of the documents) were designed to analyze the words (terms) of the original texts to identify the themes (topics) running through a corpus [ 42 , 43 ]. These algorithms allow users to organize and summarize numerous documents that cannot be annotated manually [ 41 ], thereby revealing the hidden topics in the documents [ 43 ]. We adopted the LDA TM technique, which assumes that texts are generated from a mixture of topics [ 44 ].…”

Section: Methodsmentioning

confidence: 99%

“…LDA is efficient and can generate topics of better quality [ 45 ]. From the data set created, we generated 2 probability distribution outputs: the probability distribution of topics over documents and the probability distribution of terms over topics [ 41 , 43 ]. The number of topics was determined by repeating the analysis with different numbers of topics and by comparing the perplexity of each analysis [ 41 ].…”

Section: Methodsmentioning

confidence: 99%

“…From the data set created, we generated 2 probability distribution outputs: the probability distribution of topics over documents and the probability distribution of terms over topics [ 41 , 43 ]. The number of topics was determined by repeating the analysis with different numbers of topics and by comparing the perplexity of each analysis [ 41 ]. A lower perplexity value indicates a better model fit [ 44 ], and the perplexity value decreases with the increase in the number of topics [ 41 ].…”

Section: Methodsmentioning

confidence: 99%

“…The number of topics was determined by repeating the analysis with different numbers of topics and by comparing the perplexity of each analysis [ 41 ]. A lower perplexity value indicates a better model fit [ 44 ], and the perplexity value decreases with the increase in the number of topics [ 41 ]. Both the simplicity and the interpretability of the textual content need to be considered in choosing the optimal number of topics [ 38 ].…”

Section: Methodsmentioning

confidence: 99%

See 4 more Smart Citations

Public Trust in Artificial Intelligence Applications in Mental Health Care: Topic Modeling Analysis

Shan¹,

Ji²,

Xie³

et al. 2022

JMIR Hum Factors

View full text Add to dashboard Cite

Background Mental disorders (MDs) impose heavy burdens on health care (HC) systems and affect a growing number of people worldwide. The use of mobile health (mHealth) apps empowered by artificial intelligence (AI) is increasingly being resorted to as a possible solution. Objective This study adopted a topic modeling (TM) approach to investigate the public trust in AI apps in mental health care (MHC) by identifying the dominant topics and themes in user reviews of the 8 most relevant mental health (MH) apps with the largest numbers of reviewers. Methods We searched Google Play for the top MH apps with the largest numbers of reviewers, from which we selected the most relevant apps. Subsequently, we extracted data from user reviews posted from January 1, 2020, to April 2, 2022. After cleaning the extracted data using the Python text processing tool spaCy, we ascertained the optimal number of topics, drawing on the coherence scores and used latent Dirichlet allocation (LDA) TM to generate the most salient topics and related terms. We then classified the ascertained topics into different theme categories by plotting them onto a 2D plane via multidimensional scaling using the pyLDAvis visualization tool. Finally, we analyzed these topics and themes qualitatively to better understand the status of public trust in AI apps in MHC. Results From the top 20 MH apps with the largest numbers of reviewers retrieved, we chose the 8 (40%) most relevant apps: (1) Wysa: Anxiety Therapy Chatbot; (2) Youper Therapy; (3) MindDoc: Your Companion; (4) TalkLife for Anxiety, Depression & Stress; (5) 7 Cups: Online Therapy for Mental Health & Anxiety; (6) BetterHelp-Therapy; (7) Sanvello; and (8) InnerHour. These apps provided 14.2% (n=559), 11.0% (n=431), 13.7% (n=538), 8.8% (n=356), 14.1% (n=554), 11.9% (n=468), 9.2% (n=362), and 16.9% (n=663) of the collected 3931 reviews, respectively. The 4 dominant topics were topic 4 (cheering people up; n=1069, 27%), topic 3 (calming people down; n=1029, 26%), topic 2 (helping figure out the inner world; n=963, 25%), and topic 1 (being an alternative or complement to a therapist; n=870, 22%). Based on topic coherence and intertopic distance, topics 3 and 4 were combined into theme 3 (dispelling negative emotions), while topics 2 and 1 remained 2 separate themes: theme 2 (helping figure out the inner world) and theme 1 (being an alternative or complement to a therapist), respectively. These themes and topics, though involving some dissenting voices, reflected an overall high status of trust in AI apps. Conclusions This is the first study to investigate the public trust in AI apps in MHC from the perspective of user reviews using the TM technique. The automatic text analysis and complementary manual interpretation of the collected data allowed us to discover the dominant topics hidden in a data set and categorize these topics into different themes to reveal an overall high degree of public trust. The dissenting voices from users, though only a few, can serve as indicators for health providers and app developers to jointly improve these apps, which will ultimately facilitate the treatment of prevalent MDs and alleviate the overburdened HC systems worldwide.

“…Before the analysis, we followed the standard preprocessing procedures designed in previous studies [ 37 , 38 ] to clean the data using Python 3.0 (Python Software Foundation) and to perform word part-of-speech tagging and text processing using the Python library spaCy [ 39 , 40 ]. Through data cleaning, we converted the words in the reviews into lowercase words; removed stop words, punctuation, numbers, and nonword characters; and stemmed the remaining text [ 41 ]. To generate more interpretable topics of high quality, we restricted the parts of speech of words to “noun” (NOUN), “verb” (VERB), “adjective” (ADJ), or “proper noun” (PROPN).…”

Section: Methodsmentioning

confidence: 99%

“…The statistical methods of unsupervised TM algorithms (which do not need prior labeling or annotations of the documents) were designed to analyze the words (terms) of the original texts to identify the themes (topics) running through a corpus [ 42 , 43 ]. These algorithms allow users to organize and summarize numerous documents that cannot be annotated manually [ 41 ], thereby revealing the hidden topics in the documents [ 43 ]. We adopted the LDA TM technique, which assumes that texts are generated from a mixture of topics [ 44 ].…”

Section: Methodsmentioning

confidence: 99%

“…LDA is efficient and can generate topics of better quality [ 45 ]. From the data set created, we generated 2 probability distribution outputs: the probability distribution of topics over documents and the probability distribution of terms over topics [ 41 , 43 ]. The number of topics was determined by repeating the analysis with different numbers of topics and by comparing the perplexity of each analysis [ 41 ].…”

Section: Methodsmentioning

confidence: 99%

“…From the data set created, we generated 2 probability distribution outputs: the probability distribution of topics over documents and the probability distribution of terms over topics [ 41 , 43 ]. The number of topics was determined by repeating the analysis with different numbers of topics and by comparing the perplexity of each analysis [ 41 ]. A lower perplexity value indicates a better model fit [ 44 ], and the perplexity value decreases with the increase in the number of topics [ 41 ].…”

Section: Methodsmentioning

confidence: 99%

“…The number of topics was determined by repeating the analysis with different numbers of topics and by comparing the perplexity of each analysis [ 41 ]. A lower perplexity value indicates a better model fit [ 44 ], and the perplexity value decreases with the increase in the number of topics [ 41 ]. Both the simplicity and the interpretability of the textual content need to be considered in choosing the optimal number of topics [ 38 ].…”

Section: Methodsmentioning

confidence: 99%

See 3 more Smart Citations

Public Trust in Artificial Intelligence Applications in Mental Health Care: Topic Modeling Analysis

Shan¹,

Ji²,

Xie³

et al. 2022

JMIR Hum Factors

View full text Add to dashboard Cite

Background Mental disorders (MDs) impose heavy burdens on health care (HC) systems and affect a growing number of people worldwide. The use of mobile health (mHealth) apps empowered by artificial intelligence (AI) is increasingly being resorted to as a possible solution. Objective This study adopted a topic modeling (TM) approach to investigate the public trust in AI apps in mental health care (MHC) by identifying the dominant topics and themes in user reviews of the 8 most relevant mental health (MH) apps with the largest numbers of reviewers. Methods We searched Google Play for the top MH apps with the largest numbers of reviewers, from which we selected the most relevant apps. Subsequently, we extracted data from user reviews posted from January 1, 2020, to April 2, 2022. After cleaning the extracted data using the Python text processing tool spaCy, we ascertained the optimal number of topics, drawing on the coherence scores and used latent Dirichlet allocation (LDA) TM to generate the most salient topics and related terms. We then classified the ascertained topics into different theme categories by plotting them onto a 2D plane via multidimensional scaling using the pyLDAvis visualization tool. Finally, we analyzed these topics and themes qualitatively to better understand the status of public trust in AI apps in MHC. Results From the top 20 MH apps with the largest numbers of reviewers retrieved, we chose the 8 (40%) most relevant apps: (1) Wysa: Anxiety Therapy Chatbot; (2) Youper Therapy; (3) MindDoc: Your Companion; (4) TalkLife for Anxiety, Depression & Stress; (5) 7 Cups: Online Therapy for Mental Health & Anxiety; (6) BetterHelp-Therapy; (7) Sanvello; and (8) InnerHour. These apps provided 14.2% (n=559), 11.0% (n=431), 13.7% (n=538), 8.8% (n=356), 14.1% (n=554), 11.9% (n=468), 9.2% (n=362), and 16.9% (n=663) of the collected 3931 reviews, respectively. The 4 dominant topics were topic 4 (cheering people up; n=1069, 27%), topic 3 (calming people down; n=1029, 26%), topic 2 (helping figure out the inner world; n=963, 25%), and topic 1 (being an alternative or complement to a therapist; n=870, 22%). Based on topic coherence and intertopic distance, topics 3 and 4 were combined into theme 3 (dispelling negative emotions), while topics 2 and 1 remained 2 separate themes: theme 2 (helping figure out the inner world) and theme 1 (being an alternative or complement to a therapist), respectively. These themes and topics, though involving some dissenting voices, reflected an overall high status of trust in AI apps. Conclusions This is the first study to investigate the public trust in AI apps in MHC from the perspective of user reviews using the TM technique. The automatic text analysis and complementary manual interpretation of the collected data allowed us to discover the dominant topics hidden in a data set and categorize these topics into different themes to reveal an overall high degree of public trust. The dissenting voices from users, though only a few, can serve as indicators for health providers and app developers to jointly improve these apps, which will ultimately facilitate the treatment of prevalent MDs and alleviate the overburdened HC systems worldwide.

Assessing the needs of patients with breast cancer and their families across various treatment phases using a Latent Dirichlet Allocation model: a text-mining approach to online health communities

Da,

Duan,

Ji

et al. 2024

Support Care Cancer

View full text Add to dashboard Cite

No abstract

Tweet topics on cancer among Indian Twitter users—computational approach using latent Dirichlet allocation topic modelling

Ramamoorthy,

Mappillairaju

2023

J Comput Soc Sc

View full text Add to dashboard Cite

No abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Product

Browser Extension Assistant by scite Citation Statement Search Reference Check Visualizations Dashboards Explore Journals Explore Organizations Explore Funders Embedding Badge Embedding Citation Search Pricing

Resources

Blog Help & FAQ Accessibility Statement API Terms For Universities & Governments For Researchers For Publishers For Corporate, Pharma & Enterprise Author Marketing Become an Affiliate Get an organization trial or quote scite Data & Services

About

News & Press Careers Read our Paper Coverage

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.