Concept selection for phenotypes and diseases using learn to rank

Collier, Nigel; Oellrich, Anika; Groza, Tudor

doi:10.1186/s13326-015-0019-z

Cited by 12 publications

(11 citation statements)

References 29 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is interesting to find that BeCAS which uses deterministic finite automatons, performed worse than simple dictionary lookup. However, BeCAS has shown similar type of performance in previous studies ( 51 , 52 ).…”

Section: Resultssupporting

confidence: 74%

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

et al. 2016

View full text Add to dashboard Cite

The rapidly increasing biomedical literature calls for the need of an automatic approach in the recognition and normalization of disease mentions in order to increase the precision and effectivity of disease based information retrieval. A variety of methods have been proposed to deal with the problem of disease named entity recognition and normalization. Among all the proposed methods, conditional random fields (CRFs) and dictionary lookup method are widely used for named entity recognition and normalization respectively. We herein developed a CRF-based model to allow automated recognition of disease mentions, and studied the effect of various techniques in improving the normalization results based on the dictionary lookup approach. The dataset from the BioCreative V CDR track was used to report the performance of the developed normalization methods and compare with other existing dictionary lookup based normalization methods. The best configuration achieved an F-measure of 0.77 for the disease normalization, which outperformed the best dictionary lookup based baseline method studied in this work by an F-measure of 0.13.Database URL: https://github.com/TCRNBioinformatics/DiseaseExtract

show abstract

Section: Resultssupporting

confidence: 74%

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

et al. 2016

View full text Add to dashboard Cite

show abstract

“…"white blood cell" instead of "blood cell" or "cell"). Collier et al [26] applied MetaMap and cTAKES to the extraction of phenotypes and other related concepts that concern the diagnosis and treatment of diseases. They concluded that cTAKES performs well overall but that annotation performance varies widely across semantic types, and that MetaMap with the strict matching and word sense disambiguation features enabled can have superior precision.…”

Section: Diagnostic Knowledge Extraction Using Metamap and Ctakesmentioning

confidence: 99%

Extracting Diagnostic Knowledge from MedLine Plus: A Comparison between MetaMap and cTAKES Approaches

Rodríguez‐González

Costumero

Martínez-Romero

et al. 2018

CBIO

View full text Add to dashboard Cite

Abstract:The development of diagnostic decision support systems (DDSS) requires having a reliable and consistent knowledge base about diseases and their symptoms, signs and diagnostic tests. Physicians are typically the source of this knowledge, but it is not always possible to obtain all the desired information from them. Other valuable sources are medical books and articles describing the diagnosis of diseases, but again, extracting this information is a hard and timeconsuming task. In this paper we present the results of our research, in which we have used Web scraping, natural language processing techniques, a variety of publicly available sources of diagnostic knowledge and two widely known medical concept identifiers, MetaMap and cTAKES, to extract diagnostic criteria for infectious diseases from MedLine Plus articles. A performance comparison of MetaMap and cTAKES is also presented.

show abstract

“…The majority of normalisation methods are based on matching entity mentions against concept synonyms listed in a terminological resource (e.g., [ 22 , 23 , 51 – 53 ]); more sophisticated methods combine or rank the results obtained using a number of different terminological resources [ 54 , 55 ]. Approaches based on pattern-matching or regular expressions (e.g., [ 56 – 58 ]) can account for frequently occurring variations not listed in the terminological resource (e.g., Greek or Roman suffixes for genes) and/or by helping to post-process initial normalisation output [ 59 ], in order to better handle problematic cases such as abbreviations or coordinated phrases [ 60 ].…”

Section: Related Workmentioning

confidence: 99%

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource

Alnazzawi

Thompson²,

Ananiadou

2016

PLoS ONE

View full text Add to dashboard Cite

Biomedical literature articles and narrative content from Electronic Health Records (EHRs) both constitute rich sources of disease-phenotype information. Phenotype concepts may be mentioned in text in multiple ways, using phrases with a variety of structures. This variability stems partly from the different backgrounds of the authors, but also from the different writing styles typically used in each text type. Since EHR narrative reports and literature articles contain different but complementary types of valuable information, combining details from each text type can help to uncover new disease-phenotype associations. However, the alternative ways in which the same concept may be mentioned in each source constitutes a barrier to the automatic integration of information. Accordingly, identification of the unique concepts represented by phrases in text can help to bridge the gap between text types. We describe our development of a novel method, PhenoNorm, which integrates a number of different similarity measures to allow automatic linking of phenotype concept mentions to known concepts in the UMLS Metathesaurus, a biomedical terminological resource. PhenoNorm was developed using the PhenoCHF corpus—a collection of literature articles and narratives in EHRs, annotated for phenotypic information relating to congestive heart failure (CHF). We evaluate the performance of PhenoNorm in linking CHF-related phenotype mentions to Metathesaurus concepts, using a newly enriched version of PhenoCHF, in which each phenotype mention has an expert-verified link to a concept in the UMLS Metathesaurus. We show that PhenoNorm outperforms a number of alternative methods applied to the same task. Furthermore, we demonstrate PhenoNorm’s wider utility, by evaluating its ability to link mentions of various other types of medically-related information, occurring in texts covering wider subject areas, to concepts in different terminological resources. We show that PhenoNorm can maintain performance levels, and that its accuracy compares favourably to other methods applied to these tasks.

show abstract

Concept selection for phenotypes and diseases using learn to rank

Cited by 12 publications

References 29 publications

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

Improving the dictionary lookup approach for disease normalization using enhanced dictionary and query expansion

Extracting Diagnostic Knowledge from MedLine Plus: A Comparison between MetaMap and cTAKES Approaches

Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource

Contact Info

Product

Resources

About