2021
DOI: 10.1093/bioinformatics/btab474
|View full text |Cite
|
Sign up to set email alerts
|

Medical concept normalization in clinical trials with drug and disease representation learning

Abstract: Motivation Clinical trials are the essential stage of every drug development program for the treatment to become available to patients. Despite the importance of well-structured clinical trial databases and their tremendous value for drug discovery and development such instances are very rare. Presently large-scale information on clinical trials is stored in clinical trial registers which are relatively structured, but the mappings to external databases of drugs and diseases are increasingly … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 19 publications
(17 citation statements)
references
References 42 publications
0
15
0
Order By: Relevance
“…For training, we have used the publicly available code provided by the authors at https:// github.com/dmis-lab/BioSyn with the following parameters: the number of top candidates k is 20, the mini-batch size is 16, the learning rate is 1e-5, the dense ratio for candidate retrieval is 0.5, the number of epochs is 5. To deal with nil prediction, we apply the strategy from (Miftahutdinov et al, 2021); a mention is out of KB if the nearest candidate is further than a threshold in terms of weighted average of two distances: minimum distance of false positives and maximum distance of true positives, as computed on the train set. Following previous works on entity linking (Suominen et al, 2013;Pradhan et al, 2014;Wright et al, 2019;Phan et al, 2019;Sung et al, 2020;Miftahutdinov et al, 2021;Tutubalina et al, 2020), we use top-k accuracy as the evaluation metric: Acc@k = 1 if the correct CUI is retrieved at rank ≤ k, otherwise Acc@k = 0.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…For training, we have used the publicly available code provided by the authors at https:// github.com/dmis-lab/BioSyn with the following parameters: the number of top candidates k is 20, the mini-batch size is 16, the learning rate is 1e-5, the dense ratio for candidate retrieval is 0.5, the number of epochs is 5. To deal with nil prediction, we apply the strategy from (Miftahutdinov et al, 2021); a mention is out of KB if the nearest candidate is further than a threshold in terms of weighted average of two distances: minimum distance of false positives and maximum distance of true positives, as computed on the train set. Following previous works on entity linking (Suominen et al, 2013;Pradhan et al, 2014;Wright et al, 2019;Phan et al, 2019;Sung et al, 2020;Miftahutdinov et al, 2021;Tutubalina et al, 2020), we use top-k accuracy as the evaluation metric: Acc@k = 1 if the correct CUI is retrieved at rank ≤ k, otherwise Acc@k = 0.…”
Section: Discussionmentioning
confidence: 99%
“…To deal with nil prediction, we apply the strategy from (Miftahutdinov et al, 2021); a mention is out of KB if the nearest candidate is further than a threshold in terms of weighted average of two distances: minimum distance of false positives and maximum distance of true positives, as computed on the train set. Following previous works on entity linking (Suominen et al, 2013;Pradhan et al, 2014;Wright et al, 2019;Phan et al, 2019;Sung et al, 2020;Miftahutdinov et al, 2021;Tutubalina et al, 2020), we use top-k accuracy as the evaluation metric: Acc@k = 1 if the correct CUI is retrieved at rank ≤ k, otherwise Acc@k = 0. Table 3 shows the Acc@1 and Acc@5 metrics for our test sets.…”
Section: Discussionmentioning
confidence: 99%
“…The dataset is the property of Insilico Medicine and is commercially available as a part of inClinico platform. All trials were mapped to therapies and conditions by a natural language processing (NLP) pipeline which is based on the state‐of‐the‐art Drug and Disease Interpretation Learning with Biomedical Entity Representation Transformer (DILBERT) 21–23 . This NLP pipeline incorporates two modules: (i) named entity recognition module; and (ii) entity linking module.…”
Section: Clinical Trial Dataset and Model Architecturesmentioning
confidence: 99%
“…One of the key reasons of the success of the Transformer-based models, 31 such as BERT 32 or GPT-3, 33 was the access to a huge training corpus. It has been shown in the domain of medicinal chemistry 34 that degradation in the accuracy from the full dictionary to a 30% of the dictionary is significant for disease linking in clinical trials. Apart from the quality increase, bigger and more diverse datasets are important for models robustness.…”
Section: Introductionmentioning
confidence: 99%