Comparison of MetaMap and cTAKES for entity extraction in clinical notes

Reátegui, Ruth; Ratté, Sylvie

doi:10.1186/s12911-018-0654-2

Cited by 66 publications

(46 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This heterogeneity of expressions poses challenges for efforts to use natural language processing algorithms to convert free text neurological examinations into UMLS concepts [7,8]. In a pilot study with NLM MetaMap [38,39] in the batch mode, we were able to convert 70.3% of the 2286 test phrases to UMLS concepts. A higher conversion yield might be possible with additional post-processing and pre-processing of the longer and more complex test phrases.…”

Section: Discussionmentioning

confidence: 99%

A Neuro-ontology for the neurological examination

Hier

Brint

2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background: The use of clinical data in electronic health records for machine-learning or data analytics depends on the conversion of free text into machine-readable codes. We have examined the feasibility of capturing the neurological examination as machine-readable codes based on UMLS Metathesaurus concepts. Methods: We created a target ontology for capturing the neurological examination using 1100 concepts from the UMLS Metathesaurus. We created a dataset of 2386 test-phrases based on 419 published neurological cases. We then mapped the test-phrases to the target ontology. Results: We were able to map all of the 2386 test-phrases to 601 unique UMLS concepts. A neurological examination ontology with 1100 concepts has sufficient breadth and depth of coverage to encode all of the neurologic concepts derived from the 419 test cases. Using only pre-coordinated concepts, component ontologies of the UMLS, such as HPO, SNOMED CT, and OMIM, do not have adequate depth and breadth of coverage to encode the complexity of the neurological examination. Conclusion: An ontology based on a subset of UMLS has sufficient breadth and depth of coverage to convert deficits from the neurological examination into machine-readable codes using pre-coordinated concepts. The use of a small subset of UMLS concepts for a neurological examination ontology offers the advantage of improved manageability as well as the opportunity to curate the hierarchy and subsumption relationships.

show abstract

Section: Discussionmentioning

confidence: 99%

A Neuro-ontology for the neurological examination

Hier

Brint

2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…Existing tools, such as MetaMap and cTAKES, provide programmatic means for mapping text to concepts in the UMLS. 29 However, UMLS was designed for written text, not for spoken medical conversations. The differences in (1) spoken vs. written language and (2) lay vs. expert terminology, cause inaccuracies and word mismatching when using existing tools for medical language processing from medical conversations.…”

Section: Challenge 3: Information Extraction In Clinical Conversationsmentioning

confidence: 99%

“…With MetaMap's default settings, the phrase "I am feeling fine" would result in "I" mapped to "blood group antibody I", "feeling" mapped to "emotions", and "fine" mapped to "qualitative concept" or "legal fine". Therefore, additional steps must be taken to identify semantic types and groups to control the way text is mapped to medical concepts 29 or develop rules to filter irrelevant terms, which depending on the text can be a timeconsuming trial and error process.…”

Section: Challenge 3: Information Extraction In Clinical Conversationsmentioning

confidence: 99%

Challenges of developing a digital scribe to reduce clinical documentation burden

et al. 2019

View full text Add to dashboard Cite

Clinicians spend a large amount of time on clinical documentation of patient encounters, often impacting quality of care and clinician satisfaction, and causing physician burnout. Advances in artificial intelligence (AI) and machine learning (ML) open the possibility of automating clinical documentation with digital scribes, using speech recognition to eliminate manual documentation by clinicians or medical scribes. However, developing a digital scribe is fraught with problems due to the complex nature of clinical environments and clinical conversations. This paper identifies and discusses major challenges associated with developing automated speech-based documentation in clinical settings: recording high-quality audio, converting audio to transcripts using speech recognition, inducing topic structure from conversation data, extracting medical concepts, generating clinically meaningful summaries of conversations, and obtaining clinical data for AI and ML algorithms.npj Digital Medicine (2019) 2:114 ; https://doi.

show abstract

“…For example, Hassanzadeh et al evaluated the NER tools used by the studies in Table 1 and found that the F1-score ranged from 5% to 75% for different types of UMLS concepts [24]. Likewise, Reátegui et al found that the F1-score of the NER tools varied from 44% to 96% for different types of diseases [26]. Importantly, errors produced in the NER step may diminish the effectiveness of bio-concept embeddings.…”

Section: Introductionmentioning

confidence: 99%

BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale

et al. 2020

View full text Add to dashboard Cite

A massive number of biological entities, such as genes and mutations, are mentioned in the biomedical literature. The capturing of the semantic relatedness of biological entities is vital to many biological applications, such as protein-protein interaction prediction and literature-based discovery. Concept embeddings-which involve the learning of vector representations of concepts using machine learning models-have been employed to capture the semantics of concepts. To develop concept embeddings, named-entity recognition (NER) tools are first used to identify and normalize concepts from the literature, and then different machine learning models are used to train the embeddings. Despite multiple attempts, existing biomedical concept embeddings generally suffer from suboptimal NER tools, small-scale evaluation, and limited availability.In response, we employed high-performance machine learning-based NER tools for concept recognition and trained our concept embeddings, BioConceptVec, via four different machine learning models on ~30 million PubMed abstracts. BioConceptVec covers over 400,000 biomedical concepts mentioned in the literature and is of the largest among the publicly available biomedical concept embeddings to date. To evaluate the validity and utility of BioConceptVec, we respectively performed two intrinsic evaluations (identifying related concepts based on drug-gene and gene-gene interactions) and two extrinsic evaluations (protein-protein interaction prediction and drug-drug interaction extraction), collectively using over 25 million instances from nine independent datasets (17 million instances from six intrinsic evaluation tasks and 8 million instances from three extrinsic evaluation tasks), which is, by far, the most comprehensive to our best knowledge. The intrinsic evaluation results demonstrate that BioConceptVec consistently has, by a large margin, better performance than existing concept embeddings in identifying similar and related concepts. More importantly, the extrinsic evaluation results demonstrate that using BioConceptVec with advanced deep learning models can significantly improve performance in downstream bioinformatics studies and biomedical text-mining applications.Our BioConceptVec embeddings and benchmarking datasets are publicly available at https://github.com/ncbi-nlp/BioConceptVec. Author SummaryCapturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical literature-based discovery. Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned in the entire PubMed abstracts. Our learned embeddings, namely BioConceptVec, can capture related concepts based on their surrounding contextual information in the literature, which is beyond exact term ...

show abstract

Comparison of MetaMap and cTAKES for entity extraction in clinical notes

Cited by 66 publications

References 15 publications

A Neuro-ontology for the neurological examination

A Neuro-ontology for the neurological examination

Challenges of developing a digital scribe to reduce clinical documentation burden

BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale

Contact Info

Product

Resources

About