Mohammed Khalilia scite author profile

BackgroundWe present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare.MethodsWe employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases.ResultsWe predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process.ConclusionsIn combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.

show abstract

The Digital Slide Archive: A Software Platform for Management, Integration, and Analysis of Histology for Cancer Research

Gutman

et al. 2017

View full text Add to dashboard Cite

Tissue based cancer studies can generate large amounts of histology data in the form of glass slides. These slides contain important diagnostic, prognostic, and biological information, and can be digitized into expansive and high-resolution whole-slide images (WSI) using slide-scanning devices. Effectively utilizing digital pathology data in cancer research requires the ability to manage, visualize, share and perform quantitative analysis on these large amounts of image data, tasks that are often complex and difficult for investigators with the current state of commercial digital pathology software. In this paper we describe the Digital Slide Archive (DSA), an open source web-based platform for digital pathology. DSA allows investigators to manage large collections of histologic images and integrate them with clinical and genomic metadata. The open-source model enables DSA to be extended to provide additional capabilities.

show abstract

Quantifying care coordination using natural language processing and domain-specific ontology

Popejoy

Khalilia

Popescu

et al. 2014

View full text Add to dashboard Cite

show abstract

Joint Entity Extraction and Assertion Detection for Clinical Text

Bhatia¹,

Celikkaya²,

Khalilia³

2019

View full text Add to dashboard Cite

Negative medical findings are prevalent in clinical reports, yet discriminating them from positive findings remains a challenging task for information extraction. Most of the existing systems treat this task as a pipeline of two separate tasks, i.e., named entity recognition (NER) and rule-based negation detection. We consider this as a multi-task problem and present a novel end-to-end neural model to jointly extract entities and negations. We extend a standard hierarchical encoder-decoder NER model and first adopt a shared encoder followed by separate decoders for the two tasks. This architecture performs considerably better than the previous rule-based and machine learning-based systems. To overcome the problem of increased parameter size especially for low-resource settings, we propose the Conditional Softmax Shared Decoder architecture which achieves state-of-art results for NER and negation detection on the 2010 i2b2/VA challenge dataset and a proprietary de-identified clinical dataset.

show abstract

Comprehend Medical: A Named Entity Recognition and Relationship Extraction Web Service

Bhatia

Celikkaya

Khalilia

et al. 2019

View full text Add to dashboard Cite

Comprehend Medical is a stateless and Health Insurance Portability and Accountability Act (HIPAA) eligible Named Entity Recognition (NER) and Relationship Extraction (RE) service launched under Amazon Web Services (AWS) trained using state-of-the-art deep learning models. Contrary to many existing open source tools, Comprehend Medical is scalable and does not require steep learning curve, dependencies, pipeline configurations, or installations. Currently, Comprehend Medical performs NER in five medical categories: Anatomy, Medical Condition, Medications, Protected Health Information (PHI) andTreatment, Test and Procedure (TTP). Additionally, the service provides relationship extraction for the detected entities as well as contextual information such as negation and temporality in the form of traits. Comprehend Medical provides two Application Programming Interfaces (API): 1) the NERe API which returns all the extracted named entities, their traits and the relationships between them and 2) the PHId API which returns just the protected health information contained in the text. Furthermore, Comprehend Medical is accessible through AWS Console, Java and Python Software Development Kit (SDK), making it easier for non-developers and developers to use.

show abstract

Improvements to the relational fuzzy c-means clustering algorithm

Khalilia

Bezdek

Popescu

et al. 2014

Pattern Recognition

View full text Add to dashboard Cite

End-to-End Joint Entity Extraction and Negation Detection for Clinical Text

Bhatia

Celikkaya

Khalilia

2019

View full text Add to dashboard Cite

Traffic pattern forecasting using time series analysis between spatially adjacent sensor clusters

Liu

Khalilia

Tan

et al. 2009

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.