Background Increases in electronic nicotine delivery system (ENDS) use among high school students from 2017 to 2019 appear to be associated with the increasing popularity of the ENDS device JUUL. Objective We employed a content analysis approach in conjunction with natural language processing methods using Twitter data to understand salient themes regarding JUUL use on Twitter, sentiment towards JUUL, and underage JUUL use. Methods Between July 2018 and August 2019, 11,556 unique tweets containing a JUUL-related keyword were collected. We manually annotated 4000 tweets for JUUL-related themes of use and sentiment. We used 3 machine learning algorithms to classify positive and negative JUUL sentiments as well as underage JUUL mentions. Results Of the annotated tweets, 78.80% (3152/4000) contained a specific mention of JUUL. Only 1.43% (45/3152) of tweets mentioned using JUUL as a method of smoking cessation, and only 6.85% (216/3152) of tweets mentioned the potential health effects of JUUL use. Of the machine learning methods used, the random forest classifier was the best performing algorithm among all 3 classification tasks (ie, positive sentiment, negative sentiment, and underage JUUL mentions). Conclusions Our findings suggest that a vast majority of Twitter users are not using JUUL to aid in smoking cessation nor do they mention the potential health benefits or detriments of JUUL use. Using machine learning algorithms to identify tweets containing underage JUUL mentions can support the timely surveillance of JUUL habits and opinions, further assisting youth-targeted public health intervention strategies.
Background: Perceptions of tobacco, cannabis, and electronic nicotine delivery systems (ENDS) are continually evolving in the United States. Exploring these characteristics through user generated text sources may provide novel insights into product use behavior that are challenging to identify using survey-based methods. The objective of this study was to compare the topics frequently discussed among Reddit members in cannabis, tobacco, and ENDS-specific subreddits.Methods: We collected 643,070 posts on the social media site Reddit between January 2013 and December 2018. We developed and validated an annotation scheme, achieving a high level of agreement among annotators. We then manually coded a subset of 2,630 posts for their content with relation to experiences and use of the three products of interest, and further developed word cloud representations of the words contained in these posts. Finally, we applied Latent Dirichlet Allocation (LDA) topic modeling to the 643,070 posts to identify emerging themes related to cannabis, tobacco, and ENDS products being discussed on Reddit.Results: Our manual annotation process yielded 2,148 (81.6%) posts that contained a mention(s) of either cannabis, tobacco, or ENDS with 1,537 (71.5%) of these posts mentioning cannabis, 421 (19.5%) mentioning ENDS, and 264 (12.2%) mentioning tobacco. In cannabis-specific subreddits, personal experiences with cannabis, cannabis legislation, health effects of cannabis use, methods and forms of cannabis, and the cultivation of cannabis were commonly discussed topics. The discussion in tobacco-specific subreddits often focused on the discussion of brands and types of combustible tobacco, as well as smoking cessation experiences and advice. In ENDS-specific subreddits, topics often included ENDS accessories and parts, flavors and nicotine solutions, procurement of ENDS, and the use of ENDS for smoking cessation.Conclusion: Our findings highlight the posting and participation patterns of Reddit members in cannabis, tobacco, and ENDS-specific subreddits and provide novel insights into aspects of personal use regarding these products. These findings complement epidemiologic study designs and highlight the potential of using specific subreddits to explore personal experiences with cannabis, ENDS, and tobacco products.
PURPOSE Histopathologic features are critical for studying risk factors of colorectal polyps, but remain deeply embedded within unstructured pathology reports, requiring costly and time-consuming manual abstraction for research. In this study, we developed and evaluated a natural language processing (NLP) pipeline to automatically extract histopathologic features of colorectal polyps from pathology reports, with an emphasis on individual polyp size. These data were then linked with structured electronic health record (EHR) data, creating an analysis-ready epidemiologic data set. METHODS We obtained 24,584 pathology reports from colonoscopies performed at the University of Utah’s Gastroenterology Clinic. Two investigators annotated 350 reports to determine inter-rater agreement, develop an annotation scheme, and create a reference standard for performance evaluation. The pipeline was then developed, and performance was compared against the reference for extracting polyp location, histology, size, shape, dysplasia, and the number of polyps. Finally, the pipeline was applied to 24,225 unseen reports and NLP-extracted data were linked with structured EHR data. RESULTS Across all features, our pipeline achieved a precision of 98.9%, a recall of 98.0%, and an F1-score of 98.4%. In patients with polyps, the pipeline correctly extracted 95.6% of sizes, 97.2% of polyp locations, 97.8% of histology, 98.3% of shapes, and 98.3% of dysplasia levels. When applied to unseen data, the pipeline classified 12,889 patients as having polyps, 4,907 patients without polyps, and extracted the features of 28,387 polyps. Tubular adenomas were the most common subtype (55.9%), 8.1% of polyps were advanced adenomas, and the mean polyp size was 0.57 (±0.4) cm. CONCLUSION Our pipeline extracted histopathologic features of colorectal polyps from colonoscopy pathology reports, most notably individual polyp sizes, with considerable accuracy. This study demonstrates the utility of NLP for extracting polyp features and linking these data with EHR data to create an epidemiologic data set to study colorectal polyp risk factors and outcomes.
BACKGROUND Increases in electronic nicotine delivery system (ENDS) use among high school students from 2017 to 2019 appear to be associated with the increasing popularity of the ENDS device JUUL. OBJECTIVE We employed a content analysis approach in conjunction with natural language processing methods using Twitter data to understand salient themes regarding JUUL use on Twitter, sentiment towards JUUL, and underage JUUL use. METHODS Between July 2018 and August 2019, 11,556 unique tweets containing a JUUL-related keyword were collected. We manually annotated 4000 tweets for JUUL-related themes of use and sentiment. We used 3 machine learning algorithms to classify positive and negative JUUL sentiments as well as underage JUUL mentions. RESULTS Of the annotated tweets, 78.80% (3152/4000) contained a specific mention of JUUL. Only 1.43% (45/3152) of tweets mentioned using JUUL as a method of smoking cessation, and only 6.85% (216/3152) of tweets mentioned the potential health effects of JUUL use. Of the machine learning methods used, the random forest classifier was the best performing algorithm among all 3 classification tasks (ie, positive sentiment, negative sentiment, and underage JUUL mentions). CONCLUSIONS Our findings suggest that a vast majority of Twitter users are not using JUUL to aid in smoking cessation nor do they mention the potential health benefits or detriments of JUUL use. Using machine learning algorithms to identify tweets containing underage JUUL mentions can support the timely surveillance of JUUL habits and opinions, further assisting youth-targeted public health intervention strategies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.