In the assessment of nodules in CT scans of the lungs, a number of image-derived features are diagnostically relevant. Currently, many of these features are defined only qualitatively, so they are difficult to quantify from first principles. Nevertheless, these features (through their qualitative definitions and interpretations thereof) are often quantified via a variety of mathematical methods for the purpose of computer-aided diagnosis (CAD). To determine the potential usefulness of quantified diagnostic image features as inputs to a CAD system, we investigate the predictive capability of statistical learning methods for classifying nodule malignancy. We utilize the Lung Image Database Consortium dataset and only employ the radiologist-assigned diagnostic feature values for the lung nodules therein, as well as our derived estimates of the diameter and volume of the nodules from the radiologists' annotations. We calculate theoretical upper bounds on the classification accuracy that are achievable by an ideal classifier that only uses the radiologist-assigned feature values, and we obtain an accuracy of 85.74 [Formula: see text], which is, on average, 4.43% below the theoretical maximum of 90.17%. The corresponding area-under-the-curve (AUC) score is 0.932 ([Formula: see text]), which increases to 0.949 ([Formula: see text]) when diameter and volume features are included and has an accuracy of 88.08 [Formula: see text]. Our results are comparable to those in the literature that use algorithmically derived image-based features, which supports our hypothesis that lung nodules can be classified as malignant or benign using only quantified, diagnostic image features, and indicates the competitiveness of this approach. We also analyze how the classification accuracy depends on specific features and feature subsets, and we rank the features according to their predictive power, statistically demonstrating the top four to be spiculation, lobulation, subtlety, and calcification.
The dataset contains annotations for lung nodules collected by the Lung Imaging Data Consortium and Image Database Resource Initiative (LIDC) stored as standard DICOM objects. The annotations accompany a collection of computed tomography (CT) scans for over 1000 subjects annotated by multiple expert readers, and correspond to "nodules ≥ 3 mm", defined as any lesion considered to be a nodule with greatest in-plane dimension in the range 3-30 mm regardless of presumed histology. The present dataset aims to simplify reuse of the data with the readily available tools, and is targeted towards researchers interested in the analysis of lung CT images. Acquisition and validation methods: Open source tools were utilized to parse the project-specific XML representation of LIDC-IDRI annotations and save the result as standard DICOM objects. Validation procedures focused on establishing compliance of the resulting objects with the standard, consistency of the data between the DICOM and project-specific representation, and evaluating interoperability with the existing tools. Data format and usage notes: The dataset utilizes DICOM Segmentation objects for storing annotations of the lung nodules, and DICOM Structured Reporting objects for communicating qualitative evaluations (nine attributes) and quantitative measurements (three attributes) associated with the nodules. The total of 875 subjects contain 6859 nodule annotations. Clustering of the neighboring annotations resulted in 2651 distinct nodules. The data are available in TCIA at https://doi.org/10.7937/ TCIA.2018.h7umfurq. Potential applications: The standardized dataset maintains the content of the original contribution of the LIDC-IDRI consortium, and should be helpful in developing automated tools for characterization of lung lesions and image phenotyping. In addition to those properties, the representation of the present dataset makes it more FAIR (Findable, Accessible, Interoperable, Reusable) for the research community, and enables its integration with other standardized data collections.
The Lung Imaging Data Consortium and Image Database Resource Initiative (LIDC) conducted a multi-site reader study that produced a comprehensive database of Computed Tomography (CT) scans for over 1000 subjects annotated by multiple expert readers. The result is hosted in the LIDC-IDRI collection of The Cancer Imaging Archive (TCIA). Annotations that accompany the images of the collection are stored using project-specific XML representation. This complicates their reuse, since no general-purpose tools are available to visualize or query those objects, and makes harmonization with other similar type of data non-trivial. To make the LIDC dataset more FAIR (Findable, Accessible, Interoperable, Reusable) to the research community, we prepared their standardized representation using the Digital Imaging and Communications in Medicine (DICOM) standard. This manuscript is intended to serve as a companion to the dataset to facilitate its reuse.
The Lung Imaging Data Consortium and Image Database Resource Initiative (LIDC) conducted a multi-site reader study that produced a comprehensive database of Computed Tomography (CT) scans for over 1000 subjects annotated by multiple expert readers. The result is hosted in the LIDC-IDRI collection of The Cancer Imaging Archive (TCIA). Annotations that accompany the images of the collection are stored using project-specific XML representation. This complicates their reuse, since no general-purpose tools are available to visualize or query those objects, and makes harmonization with other similar type of data non-trivial. To make the LIDC dataset more FAIR (Findable, Accessible, Interoperable, Reusable) to the research community, we prepared their standardized representation using the Digital Imaging and Communications in Medicine (DICOM) standard. This manuscript is intended to serve as a companion to the dataset to facilitate its reuse.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.