A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity

Zhao, Mengnan; Masino, Aaron J.; Yang, Christopher C.

doi:10.18653/v1/w18-2319

Cited by 16 publications

(14 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One straightforward approach is to get more labeled data containing entities mentioned above to train our model. Zhao et al [44] showed that training on a specific domain dataset provided better performance than training on a large, general domain dataset. Moreover, using more Chinese clinical corpus to train the Bert-based embedding may be another way to improve the recognition performances of long and complex entities.…”

Section: Discussionmentioning

confidence: 99%

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging

Zhang

Duan

et al. 2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background Computed tomography (CT) reports record a large volume of valuable information about patients’ conditions and the interpretations of radiology images from radiologists, which can be used for clinical decision-making and further academic study. However, the free-text nature of clinical reports is a critical barrier to use this data more effectively. In this study, we investigate a novel deep learning method to extract entities from Chinese CT reports for lung cancer screening and TNM staging. Methods The proposed approach presents a new named entity recognition algorithm, namely the BERT-based-BiLSTM-Transformer network (BERT-BTN) with pre-training, to extract clinical entities for lung cancer screening and staging. Specifically, instead of traditional word embedding methods, BERT is applied to learn the deep semantic representations of characters. Following the long short-term memory layer, a Transformer layer is added to capture the global dependencies between characters. Besides, pre-training technique is employed to alleviate the problem of insufficient labeled data. Results We verify the effectiveness of the proposed approach on a clinical dataset containing 359 CT reports collected from the Department of Thoracic Surgery II of Peking University Cancer Hospital. The experimental results show that the proposed approach achieves an 85.96% macro-F1 score under exact match scheme, which improves the performance by 1.38%, 1.84%, 3.81%,4.29%,5.12%,5.29% and 8.84% compared to BERT-BTN, BERT-LSTM, BERT-fine-tune, BERT-Transformer, FastText-BTN, FastText-BiLSTM and FastText-Transformer, respectively. Conclusions In this study, we developed a novel deep learning method, i.e., BERT-BTN with pre-training, to extract the clinical entities from Chinese CT reports. The experimental results indicate that the proposed approach can efficiently recognize various clinical entities about lung cancer screening and staging, which shows the potential for further clinical decision-making and academic research.

show abstract

Section: Discussionmentioning

confidence: 99%

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging

Zhang

Duan

et al. 2021

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…Seok et al used CRF as a learning algorithm and applied word embedding feature for NE extraction purpose [44]. A few other NER tasks which used word embedding for identifying names were [45][46][47]. There are multiple embedding techniques like 'Word2Vev', and 'GloVe 13 ' etc.…”

Section: Related Workmentioning

confidence: 99%

Word Embedding and String-Matching Techniques for Automobile Entity Name Identification from Web Reviews

Maity

Das²,

Majumder

et al. 2021

ICST Transactions on Scalable Information Systems

View full text Add to dashboard Cite

With the huge popularity of Internet, various types of information on a wide range of domains are floating over different social media platforms. To extract this information for using in diverse natural language processing applications, identifying the names is prerequisite. A study is presented here, to identify automobile names from noisy web reviews by exploring two widely used machine learning algorithms, Conditional Random Field and Support Vector Machine. The accuracy of machine learning classifiers radically rely on size and quality of training data which has been prepared manually by extracting discussion forum corpus; the task is time consuming and laborious; hence to leverage this word embedding is adopted. Though it enhances the system's performance but is unable to spot noisy names which occur in web reviews. Next, a gazetteer based string matching technique is proposed, it recognizes a new set of noisy automobile entities, resulting considerable improvement in accuracy.

show abstract

“…In recent years, there has been extensive work to leverage biomedical and clinical texts to develop word embeddings [27]. For example, clinical word embeddings have been trained to identify drugs [28], substance abuse terms [6], and anatomical locations [13]. More recently, word embeddings have been used to understand the COVID-19 pandemic.…”

Section: Covid-19 and Word Embeddingsmentioning

confidence: 99%

An Intrinsic and Extrinsic Evaluation of Learned COVID-19 Concepts using Open-Source Word Embedding Sources

Parikh

Davoudi

et al. 2021

Preprint

View full text Add to dashboard Cite

IntroductionScientists are developing new computational methods and prediction models to better clinically understand COVID-19 prevalence, treatment efficacy, and patient outcomes. These efforts could be improved by leveraging documented, COVID-19-related symptoms, findings, and disorders from clinical text sources in the electronic health record. Word embeddings can identify terms related to these clinical concepts from both the biomedical and non-biomedical domains and are being shared with the open-source community at large. However, it’s unclear how useful openly-available word embeddings are for developing lexicons for COVID-19-related concepts.ObjectiveGiven an initial lexicon of COVID-19-related terms, characterize the returned terms by similarity across various, open-source word embeddings and determine common semantic and syntactic patterns between the COVID-19 queried terms and returned terms specific to word embedding source.Materials and MethodsWe compared 7 openly-available word embedding sources. Using a series of COVID-19-related terms for associated symptoms, findings, and disorders, we conducted an inter-annotator agreement study to determine how accurately the most semantically similar returned terms could be classified according to semantic types by three annotators. We conducted a qualitative study of COVID-19 queried terms and their returned terms to identify useful patterns for constructing lexicons. We demonstrated the utility of applying such terms to discharge summaries by reporting the proportion of patients identified by concept for pneumonia, acute respiratory distress syndrome, and COVID-19 cohorts.ResultsWe observed high, pairwise inter-annotator agreement (Cohen’s Kappa) for symptoms (0.86 to 0.99), findings (0.93 to 0.99), and disorders (0.93 to 0.99). Word embedding sources generated based on characters tend to return more lexical variants and synonyms; in contrast, embeddings based on tokens more often return a variety of semantic types. Word embedding sources queried using an adjective phrase compared to a single term (e.g., dry cough vs. cough; muscle pain vs. pain) are more likely to return qualifiers of the same semantic type (e.g., “dry” returns consistency qualifiers like “wet”, “runny”). Terms for fever, cough, shortness of breath, and hypoxia retrieved a higher proportion of patients than other clinical features. Terms for dry cough returned a higher proportion of COVID-19 patients than pneumonia and ARDS populations.DiscussionWord embeddings are a valuable technology for learning terms, including synonyms. When leveraging openly-available word embedding sources, choices made for the construction of the word embeddings can significantly influence the phrases returned.

show abstract

A Framework for Developing and Evaluating Word Embeddings of Drug-named Entity

Cited by 16 publications

References 17 publications

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging

A novel deep learning approach to extract Chinese clinical entities for lung cancer screening and staging

Word Embedding and String-Matching Techniques for Automobile Entity Name Identification from Web Reviews

An Intrinsic and Extrinsic Evaluation of Learned COVID-19 Concepts using Open-Source Word Embedding Sources

Contact Info

Product

Resources

About