BACKGROUND In recent years, inappropriate drug use, known as medication noncompliance, has become an issue as the distribution and sales of drugs on the internet have increased. Therefore, we aimed to monitor improper drug use on social media. However, since corpus construction for monitoring is costly, we attempted transfer learning of corpora for drugs with similar chemical structures. OBJECTIVE We implemented a multilabel classification of social media texts based on medication noncompliance. In addition, the chemical similarity of the drugs was used to confirm the possibility of transfer learning in the corpus. METHODS We used the MediA corpus for medication noncompliance, with labels consisting of Noncompliant use/mention, Noncompliant sale, General use, and General mention assigned to tweets mentioning 20 different drugs. The classification model for tweets about a specific drug was transfer-trained on two sub-corpora: tweets about one other drug (single sub-corpus transfer learning), and tweets about other drugs (multi-sub-corpus incremental learning). Based on drug structure similarity, we evaluated whether there was an effective sub-corpus of drugs to be used for transfer learning. RESULTS A slight correlation of 0.278 was observed between the structural similarity of drugs and classification performance. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a sub-corpus when the number of sub-corpora was small. CONCLUSIONS The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of Tanimoto structural similarity if a sufficient variety of drugs is ensured.
Background Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media–based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients. Objective This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance. Methods This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs). Results The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small. Conclusions The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.