Using concept hierarchies to improve calculation of patient similarity

Girardi, Dominic; Wartner, Sandra; Halmerbauer, Gerhard; Ehrenmller, Margit; Kosorus, Hilda; Dreiseitl, Stephan

doi:10.1016/j.jbi.2016.07.021

Cited by 30 publications

(28 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is known as post-coordination [15]. Concept ontologies that are organized hierarchically support the calculation of inter-concept distances [16][17][18][19][20][21][22][23][24].…”

Section: The Ontology Should Be Organized Hierarchicallymentioning

confidence: 99%

A Neuro-ontology for the neurological examination

Hier

Brint

2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background: The use of clinical data in electronic health records for machine-learning or data analytics depends on the conversion of free text into machine-readable codes. We have examined the feasibility of capturing the neurological examination as machine-readable codes based on UMLS Metathesaurus concepts. Methods: We created a target ontology for capturing the neurological examination using 1100 concepts from the UMLS Metathesaurus. We created a dataset of 2386 test-phrases based on 419 published neurological cases. We then mapped the test-phrases to the target ontology. Results: We were able to map all of the 2386 test-phrases to 601 unique UMLS concepts. A neurological examination ontology with 1100 concepts has sufficient breadth and depth of coverage to encode all of the neurologic concepts derived from the 419 test cases. Using only pre-coordinated concepts, component ontologies of the UMLS, such as HPO, SNOMED CT, and OMIM, do not have adequate depth and breadth of coverage to encode the complexity of the neurological examination. Conclusion: An ontology based on a subset of UMLS has sufficient breadth and depth of coverage to convert deficits from the neurological examination into machine-readable codes using pre-coordinated concepts. The use of a small subset of UMLS concepts for a neurological examination ontology offers the advantage of improved manageability as well as the opportunity to curate the hierarchy and subsumption relationships.

show abstract

“…This is known as post-coordination [15]. Concept ontologies that are organized hierarchically support the calculation of inter-concept distances [16][17][18][19][20][21][22][23][24].…”

Section: The Ontology Should Be Organized Hierarchicallymentioning

confidence: 99%

A Neuro-ontology for the neurological examination

Hier

Brint

2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…The measurement of semantic similarity of two concepts was measured using the equation proposed by Girardi et al [5], Leacock & Chodorow [6], and Rada et al [7].…”

Section: Semantic Similarity Between Conceptsmentioning

confidence: 99%

“…The process for calculating the similarity value of patient data with centroid using the semantic approach and euclidean distance. Sematic similarity between concepts is calculated using equation (2), (3), (4) and the semantic similarity between sets of concepts is calculated using equation (5). Jaccard similarity is calculated using equation (6).…”

Section: K-means Similaritymentioning

confidence: 99%

“…The data used in this study are only types of textbased data and have not been able to accommodate types of categorical data represented by hierarchical model. For data that has types of categorical data with hierarchical model, it can be measured using semantic similarity equation proposed by Girardi et al [5], Leacock & Chodorow [6], and Rada et al [7]. An example of data with a category type with a hierarchical model is the ICD-10, i.e.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

Sarasvananda¹,

Wardoyo²,

Sari³

2019

Indonesian J. Comput. Cybern. Syst.

View full text Add to dashboard Cite

AbstrakBesar biaya rawat inap dari seorang pasien dapat diperkirakan dengan melakukan cluster pasien. Salah satu algoritme yang banyak digunakan untuk clustering adalah K-means. Algoritme K-means berbasiskan distance masih memiliki kelemahan dalam hal mengukur kedekatan makna atau semantik antar data. Untuk mengatasi permasalahan tersebut dapat digunakan semantic similarity untuk mengukur similaritas antar objek pada clustering sehingga kedekatan secara semantik dapat diperhitungkan. Penelitian ini bertujuan untuk melakukan clustering terhadap data pasien dengan memperhatikan kemiripan penyakit pasien. Kode ICD digunakan sebagai pedoman dalam menentukan penyakit pasien. Metode K-means digabungkan dengan semantic similarity untuk mengukur kedekatan kode ICD pasien. Metode yang digunakan untuk pengukuran kemiripan semantik antar data dalam penelitian ini yaitu semantic similarity Girardi, Leacock & Chodorow, Rada, dan Jaccard Similarity. Pengukuran kualitas cluster menggunakan metode silhouette coefficient. Berdasarkan hasil eksperimen, metode pengukuran data semantic similarity mampu manghasilkan kualitas hasil clustering yang lebih baik dibandingkan dengan jaccard similarity. Akurasi terbaik adalah 91,78% untuk ketiga metode semantic similarity sedangkan jaccard similarity memiliki akurasi terbaik 84,93%. AbstractThe cost of hospitalization from a patient can be estimated by performing a cluster of patient. One of the algorithms that is widely used for clustering is K-means. K-means algorithm, based on distance still has weaknesses in terms of measuring the proximity of meaning or semantics between data. To overcome this problem, semantic similarity can be used to measure the similarity between objects in clustering, so that, semantic proximity can be calculated. This study aims to conduct clustering of patient data by paying attention to the similarity of the patient's disease. ICD code is used as a guide in determining a patient's disease. The K-means method is combined with semantic similarity to measure the proximity of the patient's ICD code. The method used to measure the semantic similarity between data, in this study, is the semantic similarity of Girardi, Leacock & Chodorow, Rada, and Jaccard Similarity. Cluster quality measurement uses the silhouette coefficient method. Based on the experimental results, the method of measuring semantic similarity data is capable to produce better quality clustering results than without semantic similarity. The best accuracy is 91.78% for the three semantic similarity methods, whereas without semantic similarity the best accuracy is 84.93%.

show abstract

“…Euclidean, Manhattan, Mahalanobis, etc.). Novel approaches have been developed to estimate patient similarity which are not geometric-based, for example using machine learning models to estimate the distance between patients with decision trees 5 or random forests 6 ; or the use of ontologies to extract hierarchically related diagnosis 7 .…”

Section: Introductionmentioning

confidence: 99%

Medal: a patient similarity metric using medication prescribing patterns

Pineda

Pourshafeie

Ioannidis

et al. 2019

Preprint

View full text Add to dashboard Cite

Objective: Pediatric acute-onset neuropsychiatric syndrome (PANS) is a complex neuropsychiatric syndrome characterized by an abrupt onset of obsessive-compulsive symptoms and/or severe eating restrictions, along with at least two concomitant debilitating cognitive, behavioral, or neurological symptoms. A wide range of pharmacological interventions along with behavioral and environmental modifications, and psychotherapies have been adopted to treat symptoms and underlying etiologies. Our goal was to develop a data-driven approach to identify treatment patterns in this cohort. Materials and Methods: In this cohort study, we extracted medical prescription histories from electronic health records. We developed a modified dynamic programming approach to perform global alignment of those medication histories. Our approach is unique since it considers time gaps in prescription patterns as part of the similarity strategy. Results: This study included 43 consecutive new-onset pre-pubertal patients who had at least 3 clinic visits. Our algorithm identified six clusters with distinct medication usage history which may represent clinician's practice of treating PANS of different severities and etiologies i.e., two most severe groups requiring high dose intravenous steroids; two arthritic or inflammatory groups requiring prolonged nonsteroidal anti-inflammatory drug (NSAID); and two mild relapsing/remitting group treated with a short course of NSAID. The psychometric scores as outcomes in each cluster generally improved within the first two years. Discussion and conclusion: Our algorithm shows potential to improve our knowledge of treatment patterns in the PANS cohort, while helping clinicians understand how patients respond to a combination of drugs.

show abstract

Using concept hierarchies to improve calculation of patient similarity

Abstract: The new distance measure is an improvement over the current standard whenever a hierarchical arrangement of categorical values is available.

Cited by 30 publications

References 12 publications

A Neuro-ontology for the neurological examination

A Neuro-ontology for the neurological examination

The K-Means Clustering Algorithm With Semantic Similarity To Estimate The Cost of Hospitalization

Medal: a patient similarity metric using medication prescribing patterns

Contact Info

Product

Resources

About