Sifei Han scite author profile

Introduced in mid 2015, Juul is currently the most popular electronic cigarette (e-cigarette) with a market share of 49.6% (last four weeks as of 01/27/2018) and YoY growth of nearly 700% based on Nielsen market data [1]. Juul is a compact closed system device charged via USB (Figure 1) and comes with disposable flavored pods each of which contains 0.7ml with 5% nicotine by weight. Each pod is nearly equivalent to one pack of cigarettes or 200 puffs according to official Juul documentation (from www.juulvapor.com), which also states that Juul is specifically "designed with smokers in mind" and is "for adult smokers seeking a satisfying alternative to cigarettes." To order Juul online, age verification (+21) is required and implemented through a third party verification service provider. The purpose of this letter is to describe initial observations of recent Juul related messages on two different social networks (Twitter and Reddit) and traditional media. Juul messages on TwitterWe collected 250,873 tweets mentioning the word "juul" and its variants "juuling" and "juuled" from 10/19/2017 and 02/14/2018 using Twitter's free streaming API service. Due to rate limits imposed by Twitter Inc. on free data collection using their API, this dataset does not represent an exhaustive set of tweets matching our keywords during that period. This is also a filtered set and excludes tweets generated by organizations and users whose user name contains "juul" (potentially part of their actual name). Only a third of the data (84,729 tweets) represent unique tweets, the rest arising from the retweet mechanism. The duplicates from the top ten retweeted messages account for 29% of the full dataset (72,521 retweets). Next, we make some observations that highlight the nature of Juul tweets' contents based on regular expression based searches on the dataset.

show abstract

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

Sarker

Belousov

Friedrichs³

et al. 2018

View full text Add to dashboard Cite

ObjectiveWe executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data.Materials and MethodsWe organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks.ResultsAmong 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems.DiscussionAmong individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1).ConclusionsData imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).

show abstract

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

Han

Zhang

Shi

et al. 2022

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques

Kavuluru

Han

Harris

2013

View full text Add to dashboard Cite

Diagnosis codes are extracted from medical records for billing and reimbursement and for secondary uses such as quality control and cohort identification. In the US, these codes come from the standard terminology ICD-9-CM derived from the international classification of diseases (ICD). ICD-9 codes are generally extracted by trained human coders by reading all artifacts available in a patient’s medical record following specific coding guidelines. To assist coders in this manual process, this paper proposes an unsupervised ensemble approach to automatically extract ICD-9 diagnosis codes from textual narratives included in electronic medical records (EMRs). Earlier attempts on automatic extraction focused on individual documents such as radiology reports and discharge summaries. Here we use a more realistic dataset and extract ICD-9 codes from EMRs of 1000 inpatient visits at the University of Kentucky Medical Center. Using named entity recognition (NER), graph-based concept-mapping of medical concepts, and extractive text summarization techniques, we achieve an example based average recall of 0.42 with average precision 0.47; compared with a baseline of using only NER, we notice a 12% improvement in recall with the graph-based approach and a 7% improvement in precision using the extractive text summarization approach. Although diagnosis codes are complex concepts often expressed in text with significant long range non-local dependencies, our present work shows the potential of unsupervised methods in extracting a portion of codes. As such, our findings are especially relevant for code extraction tasks where obtaining large amounts of training data is difficult.

show abstract

Exploratory Analysis of Marketing and Non-marketing E-cigarette Themes on Twitter

Han¹,

Kavuluru²

2016

View full text Add to dashboard Cite

Electronic cigarettes (e-cigs) have been gaining popularity and have emerged as a controversial tobacco product since their introduction in 2007 in the U.S. The smoke-free aspect of e-cigs renders them less harmful than conventional cigarettes and is one of the main reasons for their use by people who plan to quit smoking. The US food and drug administration (FDA) has introduced new regulations early May 2016 that went into effect on August 8, 2016. Given this important context, in this paper, we report results of a project to identify current themes in e-cig tweets in terms of semantic interpretations of topics generated with topic modeling. Given marketing/advertising tweets constitute almost half of all e-cig tweets, we first build a classifier that identifies marketing and non-marketing tweets based on a hand-built dataset of 1000 tweets. After applying the classifier to a dataset of over a million tweets (collected during 4/2015 – 6/2016), we conduct a preliminary content analysis and run topic models on the two sets of tweets separately after identifying the appropriate numbers of topics using topic coherence. We interpret the results of the topic modeling process by relating topics generated to specific e-cig themes. We also report on themes identified from e-cig tweets generated at particular places (such as schools and churches) for geo-tagged tweets found in our dataset using the GeoNames API. To our knowledge, this is the first effort that employs topic modeling to identify e-cig themes in general and in the context of geo-tagged tweets tied to specific places of interest.

show abstract

Analytical validation of GMEX rapid point-of-careCYP2C19genotyping system for the CHANCE-2 trial

et al. 2021

View full text Add to dashboard Cite

Background and purposeRapid genotyping is useful for guiding early antiplatelet therapy in patients with high-risk nondisabling ischaemic cerebrovascular events (HR-NICE). Conventional genetic testing methods used in CYP2C19 genotype-guided antiplatelet therapy for patients with HR-NICE did not satisfy the needs of the Clopidogrel in High-Risk Patients with Acute Nondisabling Cerebrovascular Events (CHANCE)-2 trial. Therefore, we developed the rapid-genotyping GMEX (point-of-care) system to meet the needs of the CHANCE-2 trial.MethodsHealthy individuals and patients with history of cardiovascular diseases (n=408) were enrolled from six centres of the CHANCE-2 trial. We compared the laboratory-based genomic test results with Sanger sequencing test results for accuracy verification. Next, we demonstrated the accuracy, timeliness and clinical operability of the GMEX system compared with laboratory-based technology (YZY Kit) to verify whether the GMEX system satisfies the needs of the CHANCE-2 trial.ResultsGenotypes reported by the GMEX system showed 100% agreement with those determined by using the YZY Kit and Sanger sequencing for all three CYP2C19 alleles (*2, *3 and *17) tested. The average result’s turnaround times for the GMEX and YZY Kit methods were 85.0 (IQR: 85.0–86.0) and 1630.0 (IQR: 354.0–7594.0) min (p<0.001), respectively.ConclusionsOur data suggest that the GMEX system is a reliable and feasible point-of-care system for rapid CYP2C19 genotyping for the CHANCE-2 trial or related clinical and research applications.

show abstract

On Assessing the Sentiment of General Tweets

Han

Kavuluru

2015

View full text Add to dashboard Cite

Extracting social determinants of health events with transformer-based multitask, multilabel named entity recognition

Richie

Ruiz

Han

et al. 2023

View full text Add to dashboard Cite

Objective Social determinants of health (SDOH) are nonclinical, socioeconomic conditions that influence patient health and quality of life. Identifying SDOH may help clinicians target interventions. However, SDOH are more frequently available in narrative notes compared to structured electronic health records. The 2022 n2c2 Track 2 competition released clinical notes annotated for SDOH to promote development of NLP systems for extracting SDOH. We developed a system addressing 3 limitations in state-of-the-art SDOH extraction: the inability to identify multiple SDOH events of the same type per sentence, overlapping SDOH attributes within text spans, and SDOH spanning multiple sentences. Materials and Methods We developed and evaluated a 2-stage architecture. In stage 1, we trained a BioClinical-BERT-based named entity recognition system to extract SDOH event triggers, that is, text spans indicating substance use, employment, or living status. In stage 2, we trained a multitask, multilabel NER to extract arguments (eg, alcohol “type”) for events extracted in stage 1. Evaluation was performed across 3 subtasks differing by provenance of training and validation data using precision, recall, and F1 scores. Results When trained and validated on data from the same site, we achieved 0.87 precision, 0.89 recall, and 0.88 F1. Across all subtasks, we ranked between second and fourth place in the competition and always within 0.02 F1 from first. Conclusions Our 2-stage, deep-learning-based NLP system effectively extracted SDOH events from clinical notes. This was achieved with a novel classification framework that leveraged simpler architectures compared to state-of-the-art systems. Improved SDOH extraction may help clinicians improve health outcomes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sifei Han

On the popularity of the USB flash drive-shaped electronic cigarette Juul

Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing

Unsupervised Extraction of Diagnosis Codes from EMRs Using Knowledge-Based and Extractive Text Summarization Techniques

Exploratory Analysis of Marketing and Non-marketing E-cigarette Themes on Twitter

Analytical validation of GMEX rapid point-of-careCYP2C19genotyping system for the CHANCE-2 trial

On Assessing the Sentiment of General Tweets

Extracting social determinants of health events with transformer-based multitask, multilabel named entity recognition

Contact Info

Product

Resources

About