We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.
A multitude of information sources is present in the electronic health record (EHR), each of which can contain clues to automatically assign diagnosis and procedure codes. These sources however show information overlap and quality differences, which complicates the retrieval of these clues. Through feature selection, a denser representation with a consistent quality and less information overlap can be obtained. We introduce and compare coverage-based feature selection methods, based on confidence and information gain. These approaches were evaluated over a range of medical specialties, with seven different medical specialties for ICD-9-CM code prediction (six at the Antwerp University Hospital and one in the MIMIC-III dataset) and two different medical specialties for ICD-10-CM code prediction. Using confidence coverage to integrate all sources in an EHR shows a consistent improvement in F-measure (49.83% for diagnosis codes on average), both compared with the baseline (44.25% for diagnosis codes on average) and with using the best standalone source (44.41% for diagnosis codes on average). Confidence coverage creates a concise patient stay representation independent of a rigid framework such as UMLS, and contains easily interpretable features. Confidence coverage has several advantages to a baseline setup. In our baseline setup, feature selection was limited to a filter removing features with less than five total occurrences in the trainingset. Prediction results improved consistently when using multiple heterogeneous sources to predict clinical codes, while reducing the number of features and the processing time.
Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance.
Background Few case reports on human infections with the beef tapeworm Taenia saginata and the pork tapeworm, Taenia solium , diagnosed in Belgium have been published, yet the grey literature suggests a higher number of cases. Aim To identify and describe cases of taeniasis and cysticercosis diagnosed at two Belgian referral medical institutions from 1990 to 2015. Methods In this observational study we retrospectively gathered data on taeniasis and cysticercosis cases by screening laboratory, medical record databases as well a uniform hospital discharge dataset. Results A total of 221 confirmed taeniasis cases were identified. All cases for whom the causative species could be determined (170/221, 76.9%) were found to be T. saginata infections. Of those with available information, 40.0% were asymptomatic (26/65), 15.4% reported diarrhoea (10/65), 9.2% reported anal discomfort (6/65) and 15.7% acquired the infection in Belgium (11/70). Five definitive and six probable cases of neurocysticercosis (NCC), and two cases of non-central nervous system cysticercosis (non-CNS CC) were identified. Common symptoms and signs in five of the definitive and probable NCC cases were epilepsy, headaches and/or other neurological disorders. Travel information was available for 10 of the 13 NCC and non-CNS CC cases; two were Belgians travelling to and eight were immigrants or visitors travelling from endemic areas. Conclusions The current study indicates that a non-negligible number of taeniasis cases visit Belgian medical facilities, and that cysticercosis is occasionally diagnosed in international travellers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.