Sheng Yu scite author profile

The migration of imaging reports to electronic medical record systems holds great potential in terms of advancing radiology research and practice by leveraging the large volume of data continuously being updated, integrated, and shared. However, there are significant challenges as well, largely due to the heterogeneity of how these data are formatted. Indeed, although there is movement toward structured reporting in radiology (ie, hierarchically itemized reporting with use of standardized terminology), the majority of radiology reports remain unstructured and use free-form language. To effectively "mine" these large datasets for hypothesis testing, a robust strategy for extracting the necessary information is needed. Manual extraction of information is a time-consuming and often unmanageable task. "Intelligent" search engines that instead rely on natural language processing (NLP), a computer-based approach to analyzing free-form text or speech, can be used to automate this data mining task. The overall goal of NLP is to translate natural human language into a structured format (ie, a fixed collection of elements), each with a standardized set of choices for its value, that is easily manipulated by computer programs to (among other things) order into subcategories or query for the presence or absence of a finding. The authors review the fundamentals of NLP and describe various techniques that constitute NLP in radiology, along with some key applications. After completing this journal-based SA-CME activity, participants will be able to:■ Describe the set of technologies that compose present-day natural language processing in radiology.■ List examples of how these technologies have been combined to achieve specific objectives in radiology research and, potentially, clinical practice.■ Discuss current capabilities and possible future applications of use of natural language processing in radiology.

show abstract

High Throughput Phenotyping for Dimensional Psychopathology in Electronic Health Records

McCoy

Hart

et al. 2018

Biological Psychiatry

View full text Add to dashboard Cite

show abstract

Surrogate-assisted feature extraction for high-throughput phenotyping

Chakrabortty

Liao

et al. 2016

View full text Add to dashboard Cite

show abstract

Large-scale identification of patients with cerebral aneurysms using natural language processing

et al. 2017

View full text Add to dashboard Cite

Objective: To use natural language processing (NLP) in conjunction with the electronic medical record (EMR) to accurately identify patients with cerebral aneurysms and their matched controls.Methods: ICD-9 and Current Procedural Terminology codes were used to obtain an initial data mart of potential aneurysm patients from the EMR. NLP was then used to train a classification algorithm with .632 bootstrap cross-validation used for correction of overfitting bias. The classification rule was then applied to the full data mart. Additional validation was performed on 300 patients classified as having aneurysms. Controls were obtained by matching age, sex, race, and healthcare use.Results: We identified 55,675 patients of 4.2 million patients with ICD-9 and Current Procedural Terminology codes consistent with cerebral aneurysms. Of those, 16,823 patients had the term aneurysm occur near relevant anatomic terms. After training, a final algorithm consisting of 8 coded and 14 NLP variables was selected, yielding an overall area under the receiveroperating characteristic curve of 0.95. After the final algorithm was applied, 5,589 patients were classified as having aneurysms, and 54,952 controls were matched to those patients. The positive predictive value based on a validation cohort of 300 patients was 0.86. Conclusions:We harnessed the power of the EMR by applying NLP to obtain a large cohort of patients with intracranial aneurysms and their matched controls. Such algorithms can be generalized to other diseases for epidemiologic and genetic studies. Cerebral aneurysm is a potentially devastating disorder that affects nearly 3% of the population.

show abstract

Association of intracranial aneurysm rupture with smoking duration, intensity, and cessation

et al. 2017

View full text Add to dashboard Cite

show abstract

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)

et al. 2019

View full text Add to dashboard Cite

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping using EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semisupervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures reducing the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 days if all data are available; however, the timing is largely dependent on the chart review stage which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sheng Yu

Enabling phenotypic big data with PheNorm

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources

Natural Language Processing Technologies in Radiology Research and Clinical Applications

High Throughput Phenotyping for Dimensional Psychopathology in Electronic Health Records

Surrogate-assisted feature extraction for high-throughput phenotyping

Large-scale identification of patients with cerebral aneurysms using natural language processing

Association of intracranial aneurysm rupture with smoking duration, intensity, and cessation

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP)

Contact Info

Product

Resources

About