Objective: The novel coronavirus disease (COVID-19), broke out in December 2019, is a global pandemic. Rapidly in the past few months, a large number of clinical studies have been initiated worldwide to find effective therapeutics, vaccines, and preventive strategies. In this study, we aim to understand the landscape of COVID-19 clinical research and identify the gaps and issues that may cause difficulty in recruitment and the lack of population representativeness. Materials and Methods: We analyzed 2,034 COVID-19 studies registered in the largest public registry - ClinicalTrials.gov. Leveraging natural language processing, descriptive analysis, association analysis, and clustering analysis, we characterized COVID-19 clinical studies by phase and design features. Particularly, we analyzed their eligibility criteria to understand: (1) whether they considered the reported underlying health conditions that may lead to severe illnesses, and (2) if these studies excluded older adults, either explicitly or implicitly, which may reduce the generalizability of these studies in older adults. Results: The 5 most frequently tested drugs are Hydroxychloroquine (N=148), Azithromycin (N=46), Tocilizumab (N=29), Lopinavir (N=20), and Ritonavir (N=20). Most trials did not have an upper age limit and did not exclude patients with common chronic conditions such as hypertension and diabetes that are prevalent in older adults. However, known risk factors that may lead to severe illnesses have not been adequately considered by existing studies. Conclusions: A careful examination of the registered COVID-19 clinical studies can identify the research gaps and inform future COVID-19 trial design towards balanced internal validity and generalizability.
Identification and indexing of chemical compounds in full-text articles are essential steps in biomedical article categorization, information extraction, and biological text mining. BioCreative Challenge was established to evaluate methods for biological text mining and information extraction. Track 2 of BioCreative VII (summer 2021) consists of two subtasks: chemical identification and chemical indexing in full-text PubMed articles. The chemical identification subtask also includes two parts: chemical named entity recognition (NER) and chemical normalization. In this paper, we present our work on developing a hybrid pipeline for chemical named entity recognition, chemical normalization, and chemical indexing in full-text PubMed articles. Specifically, we applied BERT-based methods for chemical NER and chemical indexing, and a sieve-based dictionary matching method for chemical normalization. For subtask 1, we used PubMedBERT with data augmentation on the chemical NER task. Several chemical-MeSH dictionaries including MeSH.XML, SUPP.XML, MRCONSO.RFF, and PubTator chemical annotations are used in a specific order to get the best performance on chemical normalization. We achieved an F1 score of 0.86 and 0.7668 on chemical NER and chemical normalization, respectively. For subtask 2, we formulated it as a binary prediction problem for each individual chemical compound name. We then used a BERT-based model with engineered features and achieved a strict F1 score of 0.4825 on the test set, which is substantially higher than the median F1 score (0.3971) of all the submissions.
Objective In the past few months, a large number of clinical studies on the novel coronavirus disease (COVID-19) have been initiated worldwide to find effective therapeutics, vaccines, and preventive strategies for COVID-19. In this study, we aim to understand the landscape of COVID-19 clinical research and identify the issues that may cause recruitment difficulty or reduce study generalizability. Methods We analyzed 3,765 COVID-19 studies registered in the largest public registry - ClinicalTrials.gov, leveraging natural language processing and using descriptive, association, and clustering analyses. We first characterized COVID-19 studies by study features such as phase and tested intervention. We then took a deep dive and analyzed their eligibility criteria to understand whether these studies: (1) considered the reported underlying health conditions that may lead to severe illnesses, and (2) excluded older adults, either explicitly or implicitly, which may reduce the generalizability of these studies to the older adults population. Results Our analysis included 2295 interventional studies and 1470 observational studies. Most trials did not explicitly exclude older adults with common chronic conditions. However, known risk factors such as diabetes and hypertension were considered by less than 5% of trials based on their trial description. Pregnant women were excluded by 34.9% of the studies. Conclusions Most COVID-19 clinical studies included both genders and older adults. However, risk factors such as diabetes, hypertension, and pregnancy were under-represented, likely skewing the population that was sampled. A careful examination of existing COVID-19 studies can inform future COVID-19 trial design towards balanced internal validity and generalizability. Lay Summary Since the outbreak of COVID-19 in early 2020, thousands of clinical studies have been conducted to evaluate the efficacy and safety of various types of treatments and vaccines in human. COVID-19 clinical studies play a crucial role in controlling the virus. Yet it is unclear what types of patients were considered by these studies. This study analyzed 3,765 COVID-19 clinical study summaries downloaded from a major clinical trial registry ClinicalTrials.gov. We employed natural language processing techniques to parse the study description and eligibility criteria of these studies and then performed descriptive and clustering analysis on the parsing results. We found that older adults were not systematically excluded but pregnant women were often excluded. It was also found that the known risk factors such as diabetes, hypertension, obesity, and asthma, which may lead to serious illnesses, were considered by less than 5% of the studies according to their study description and eligibility criteria. This study provides an evidence that natural language processing can be applied to examine the design of clinical studies and identify issues that may cause delays in patient recruitment and the lack of real-world population representativeness.
Large volumes of publications are being produced in biomedical sciences nowadays with ever-increasing speed. To deal with the large amount of unstructured text data, effective natural language processing (NLP) methods need to be developed for various tasks such as document classification and information extraction. BioCreative Challenge was established to evaluate the effectiveness of information extraction methods in biomedical domain and facilitate their development as a community-wide effort. In this paper, we summarize our work and what we have learned from the latest round, BioCreative Challenge VII, where we participated in all five tracks. Overall, we found three key components for achieving high performance across a variety of NLP tasks: (1) pre-trained NLP models; (2) data augmentation strategies and (3) ensemble modelling. These three strategies need to be tailored towards the specific tasks at hands to achieve high-performing baseline models, which are usually good enough for practical applications. When further combined with task-specific methods, additional improvements (usually rather small) can be achieved, which might be critical for winning competitions. Database URL: https://doi.org/10.1093/database/baac066
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.