The Breast Imaging Reporting and Data System (BI-RADS) was developed to reduce variation in the descriptions of findings. Manual analysis of breast radiology report data is challenging but is necessary for clinical and healthcare quality assurance activities. The objective of this study is to develop a natural language processing (NLP) system for automated BI-RADS categories extraction from breast radiology reports. We evaluated an existing rule-based NLP algorithm, and then we developed and evaluated our own method using a supervised machine learning approach. We divided the BI-RADS category extraction task into two specific tasks: (1) annotation of all BI-RADS category values within a report, (2) classification of the laterality of each BI-RADS category value. We used one algorithm for task 1 and evaluated three algorithms for task 2. Across all evaluations and model training, we used a total of 2159 radiology reports from 18 hospitals, from 2003 to 2015. Performance with the existing rule-based algorithm was not satisfactory. Conditional random fields showed a high performance for task 1 with an F-1 measure of 0.95. Rules from partial decision trees (PART) algorithm showed the best performance across classes for task 2 with a weighted F-1 measure of 0.91 for BIRADS 0–6, and 0.93 for BIRADS 3–5. Classification performance by class showed that performance improved for all classes from Naïve Bayes to Support Vector Machine (SVM), and also from SVM to PART. Our system is able to annotate and classify all BI-RADS mentions present in a single radiology report and can serve as the foundation for future studies that will leverage automated BI-RADS annotation, to provide feedback to radiologists as part of a learning health system loop.
Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually making it difficult to correlate phenotypic data to genomic data. In addition, genomic data is being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from Electronic Medical Records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually-annotated dataset of the University of Pittsburgh Medical Center (UPMC) breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment.
BackgroundThe Cancer Genome Atlas Project (TCGA) is a National Cancer Institute effort to profile at least 500 cases of 20 different tumor types using genomic platforms and to make these data, both raw and processed, available to all researchers. TCGA data are currently over 1.2 Petabyte in size and include whole genome sequence (WGS), whole exome sequence, methylation, RNA expression, proteomic, and clinical datasets. Publicly accessible TCGA data are released through public portals, but many challenges exist in navigating and using data obtained from these sites. We developed TCGA Expedition to support the research community focused on computational methods for cancer research. Data obtained, versioned, and archived using TCGA Expedition supports command line access at high-performance computing facilities as well as some functionality with third party tools. For a subset of TCGA data collected at University of Pittsburgh, we also re-associate TCGA data with de-identified data from the electronic health records. Here we describe the software as well as the architecture of our repository, methods for loading of TCGA data to multiple platforms, and security and regulatory controls that conform to federal best practices.ResultsTCGA Expedition software consists of a set of scripts written in Bash, Python and Java that download, extract, harmonize, version and store all TCGA data and metadata. The software generates a versioned, participant- and sample-centered, local TCGA data directory with metadata structures that directly reference the local data files as well as the original data files. The software supports flexible searches of the data via a web portal, user-centric data tracking tools, and data provenance tools. Using this software, we created a collaborative repository, the Pittsburgh Genome Resource Repository (PGRR) that enabled investigators at our institution to work with all TCGA data formats, and to interrogate these data with analysis pipelines, and associated tools. WGS data are especially challenging for individual investigators to use, due to issues with downloading, storage, and processing; having locally accessible WGS BAM files has proven invaluable.ConclusionOur open-source, freely available TCGA Expedition software can be used to create a local collaborative infrastructure for acquiring, managing, and analyzing TCGA data and other large public datasets.
Objective-Previous studies in our laboratory have shown the benefits of immediate feedback on cognitive performance for pathology residents using an Intelligent Tutoring System in Pathology. In this study, we examined the effect of immediate feedback on metacognitive performance, and investigated whether other metacognitive scaffolds will support metacognitive gains when immediate feedback is faded.Methods-Twenty-three (23) participants were randomized into intervention and control groups. For both groups, periods working with the ITS under varying conditions were alternated with independent computer-based assessments. On day 1, a within-subjects design was used to evaluate the effect of immediate feedback on cognitive and metacognitive performance. On day 2, a betweensubjects design was used to compare the use of other metacognitive scaffolds (intervention group) against no metacognitive scaffolds (control group) on cognitive and metacognitive performance, as immediate feedback was faded. Measurements included learning gains (a measure of cognitive performance), as well as several measures of metacognitive performance, including GoodmanKruskal Gamma correlation (G), Bias, and Discrimination. For the intervention group, we also computed metacognitive measures during tutoring sessions.Results-Results showed that immediate feedback in an intelligent tutoring system had a statistically significant positive effect on learning gains, G and discrimination. Removal of immediate feedback was associated with decreasing metacognitive performance, and this decline was not prevented when students used a version of the tutoring system that provided other metacognitive scaffolds. Results obtained directly from the ITS suggest that other metacognitive scaffolds do have a positive effect on G and Discrimination, as immediate feedback is faded. Conclusions-Immediate feedback had a positive effect on both metacognitive and cognitive gains in a medical tutoring system. Other metacognitive scaffolds were not sufficient to replace immediate feedback in this study. However, results obtained directly from the tutoring system are not consistent with results obtained from assessments. In order to facilitate transfer to real-world tasks, further research will be needed to determine the optimum methods for supporting metacognition as immediate feedback is faded. NIH Public Access
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.