BackgroundWith the rapid development of new psychoactive substances (NPS) and changes in the use of more traditional drugs, it is increasingly difficult for researchers and public health practitioners to keep up with emerging drugs and drug terms. Substance use surveys and diagnostic tools need to be able to ask about substances using the terms that drug users themselves are likely to be using. Analyses of social media may offer new ways for researchers to uncover and track changes in drug terms in near real time. This study describes the initial results from an innovative collaboration between substance use epidemiologists and linguistic scientists employing techniques from the field of natural language processing to examine drug-related terms in a sample of tweets from the United States.ObjectiveThe objective of this study was to assess the feasibility of using distributed word-vector embeddings trained on social media data to uncover previously unknown (to researchers) drug terms.MethodsIn this pilot study, we trained a continuous bag of words (CBOW) model of distributed word-vector embeddings on a Twitter dataset collected during July 2016 (roughly 884.2 million tokens). We queried the trained word embeddings for terms with high cosine similarity (a proxy for semantic relatedness) to well-known slang terms for marijuana to produce a list of candidate terms likely to function as slang terms for this substance. This candidate list was then compared with an expert-generated list of marijuana terms to assess the accuracy and efficacy of using word-vector embeddings to search for novel drug terminology.ResultsThe method described here produced a list of 200 candidate terms for the target substance (marijuana). Of these 200 candidates, 115 were determined to in fact relate to marijuana (65 terms for the substance itself, 50 terms related to paraphernalia). This included 30 terms which were used to refer to the target substance in the corpus yet did not appear on the expert-generated list and were therefore considered to be successful cases of uncovering novel drug terminology. Several of these novel terms appear to have been introduced as recently as 1 or 2 months before the corpus time slice used to train the word embeddings.ConclusionsThough the precision of the method described here is low enough as to still necessitate human review of any candidate term lists generated in such a manner, the fact that this process was able to detect 30 novel terms for the target substance based only on one month’s worth of Twitter data is highly promising. We see this pilot study as an important proof of concept and a first step toward producing a fully automated drug term discovery system capable of tracking emerging NPS terms in real time.
Sentiments towards racial/ethnic racial/ethnic minorities may impact cardiovascular disease (CVD) through direct and indirect pathways. In this study, we assessed the association between Twitter-derived sentiments towards racial/ethnic minorities at state level and individual level CVDrelated outcomes from the 2017 Behavioral Risk Factor Surveillance System (BRFSS). Outcomes included hypertension, diabetes, obesity, stroke, myocardial infarction (MI), coronary heart disease (CHD), and any CVD from BRFSS 2017 (N=433,434 to 433,680 across outcomes). A total of 30 million race-related tweets were collected using Twitter Streaming Application Programming Interface (API) from 2015 to 2018. Prevalence of negative and positive sentiment towards racial/ ethnic minorities were constructed at state level and merged with CVD outcomes. Poisson regression was used, and all the models adjusted for individual level demographics as well as state level demographics. Individuals living in states with the highest level of negative sentiment towards racial/ethnic minorities had 11% higher prevalence of hypertension (
Social media research often has two things in common: Twitter is the platform used and a keyword filter list is used to extract only relevant Tweets. Here we propose that (a) alternative platforms be considered more often when doing social media research, and (b) regardless of platform, researchers use word embeddings as a type of synonym discovery to improve their keyword filter list, both of which lead to more relevant data. We demonstrate the benefit of these proposals by comparing how successful our synonym discovery method is at finding terms for marijuana and select opioids on Twitter versus a platform that can be filtered by topic, Reddit. We also find words that are not on the U.S. Drug Enforcement Agency (DEA) drug slang list for that year, some of which appear on the list the subsequent year, showing that this method could be employed to find drug terms faster than traditional means.
Imposters, seemingly third person nouns with speech act participant reference, have been varyingly analyzed as being licensed through an elaborated DP syntax (Collins and Postal. 2008. Imposters. Manuscript. http://ling.auf.net/lingbuzz/000640 (accessed 12 May 2017), Collins and Postal. 2012. Imposters: A study of pronominal agreement. Cambridge: MIT Press) or through lexical specification (Kaufman 2014. The syntax of Indonesian imposters. In Chris Collins (ed.), Cross-linguistic studies of imposters and pronominal agreement, 89–120. Oxford: Oxford University Press). Looking at Korean and Indonesian, two languages that make frequent use of imposters, we show that both can be accounted for without appeal to an elaborated DP syntax and that, in fact, such a structure makes the wrong predictions. Rather, other heads in the clause, in conjunction with differences in lexical specification, can account for both languages. In Indonesian, which freely allows imposters to bind anaphors with person features of the referent, the imposter is lexically specified for those features. In Korean, where such binding is restricted, imposters are underspecified for person and so anaphors only occur when there is another person feature-carrying head to supply the necessary features (Zanuttini et al. 2012. A syntactic analysis of interpretive restrictions on imperative, promissive, and exhortative subjects. Natural Language & Linguistic Theory 30(4). 1231–1274). Previously left unexplained was why Korean imposters were unable to bind any person-marked anaphors, including third person, under an assumption that person-underspecified DPs get valued with a default third person feature. We argue this is a result of the difference in types of third person, those specified for third person and those that are not (Sigurðsson 2010. On EPP effects. Studia Linguistica 64(2). 159–189).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.