Rationale and objectives Tumor volume change has potential as a biomarker for diagnosis, therapy planning, and treatment response. Precision was evaluated and compared among semi-automated lung tumor volume measurement algorithms from clinical thoracic CT datasets. The results inform approaches and testing requirements for establishing conformance with the Quantitative Imaging Biomarker Alliance (QIBA) CT Volumetry Profile. Materials and Methods Industry and academic groups participated in a challenge study. Intra-algorithm repeatability and inter-algorithm reproducibility were estimated. Relative magnitudes of various sources of variability were estimated using a linear mixed effects model. Segmentation boundaries were compared to provide a basis on which to optimize algorithm performance for developers. Results Intra-algorithm repeatability ranged from 13% (best performing) to 100% (least performing), with most algorithms demonstrating improved repeatability as the tumor size increased. Inter-algorithm reproducibility determined in three partitions and found to be 58% for the four best performing groups, 70% for the set of groups meeting repeatability requirements, and 84% when all groups but the least performer were included. The best performing partition performed markedly better on tumors with equivalent diameters above 40 mm. Larger tumors benefitted by human editing but smaller tumors did not. One-fifth to one-half of the total variability came from sources independent of the algorithms. Segmentation boundaries differed substantially, not just in overall volume but in detail. Conclusions Nine of the twelve participating algorithms pass precision requirements similar to what is indicated in the QIBA Profile, with the caveat that the current study was not designed to explicitly evaluate algorithm Profile conformance. Change in tumor volume can be measured with confidence to within ±14% using any of these nine algorithms on tumor sizes above 10 mm. No partition of the algorithms were able to meet the QIBA requirements for interchangeability down to 10 mm, though the partition comprised of the best performing algorithms did meet this requirement above a tumor size of approximately 40 mm.
Rationale and Objectives: Quantifying changes in lung tumor volume is important for diagnosis, therapy planning, and evaluation of response to therapy. The aim of this study was to assess the performance of multiple algorithms on a reference data set. The study was organized by the Quantitative Imaging Biomarker Alliance (QIBA). Materials and Methods: The study was organized as a public challenge. Computed tomography scans of synthetic lung tumors in an anthropomorphic phantom were acquired by the Food and Drug Administration. Tumors varied in size, shape, and radiodensity. Participants applied their own semi-automated volume estimation algorithms that either did not allow or allowed post-segmentation correction (type 1 or 2, respectively). Statistical analysis of accuracy (percent bias) and precision (repeatability and reproducibility) was conducted across algorithms, as well as across nodule characteristics, slice thickness, and algorithm type. Results: Eighty-four percent of volume measurements of QIBA-compliant tumors were within 15% of the true volume, ranging from 66% to 93% across algorithms, compared to 61% of volume measurements for all tumors (ranging from 37% to 84%). Algorithm type did not affect bias substantially; however, it was an important factor in measurement precision. Algorithm precision was notably better as tumor size increased, worse for irregularly shaped tumors, and on the average better for type 1 algorithms. Over all nodules meeting the QIBA Profile, precision, as measured by the repeatability coefficient, was 9.0% compared to 18.4% overall. Conclusion: The results achieved in this study, using a heterogeneous set of measurement algorithms, support QIBA quantitative performance claims in terms of volume measurement repeatability for nodules meeting the QIBA Profile criteria.
Hybrid datasets yielded conclusions similar to real computed tomography datasets where phantom QIBA compliant was also compliant for hybrid datasets. Some groups deemed compliant for simulated methods, not for physical lesion measurements. The magnitude of this difference was small (<5.4%). While technical performance is not equivalent, they correlate, such that, volumetrically simulated lesions could potentially serve as practical proxies.
e13557 Background: Lung cancer is the leading cause of cancer death in the world including more than 160,000 deaths in the US. The purpose of the study was to determine whether inter reader variability in Sum of Diameters (SOD) of tumor burden has any correlation with variability in end point assessment in lung cancer progression. RECIST 1.1 is based on the SOD of target lesions seen on imaging studies. Response criteria for evaluation of target lesions include - Complete response (CR), Partial response (PR), Progressive disease (PD) and Stable disease (SD). The key determinant of patient response is based on Target Lesion response which in turn is determined by SOD. Inter reader variability study plays an important role in the development of reliable diagnostic tools and understanding of imaging outcomes given the confounding factors like effusion, atelectasis and consolidation in lung cancer that affect Target Lesion selection. Methods: Retrospective analysis of 470 patients was carried out using RECIST 1.1. Double read with adjudication is the preferred read model for submission studies where images are read by two independent reviewers blinded to treatment allocation. As per RECIST 1.1, lesions were measured in the longest diameter for non-nodal and short axis for nodal lesions. This was followed by the calculation of SOD for total tumor burden. If these two primary reviewers disagree, then a third radiologist, the “adjudicator”, reviews the assessments performed by the first two radiologists and selects between the more accurate one. For further analysis, patients were divided into 2 groups, the one with no adjudication i.e. agreement between both readers and the second group with adjudication i.e. disagreement between both readers and ANOVA was used to perform analysis of Variance. Results: Of 470 patients, 332 patients with disagreement were adjudicated, while there was agreement on 138 patients assessments between both readers. SOD of baseline visits for all patients was assessed using ANOVA - single factor with following results: F ratio of 4.76 for Disagreement group was more than F crit (3.86) with P-value 0.03, while for Agreement group F value was less than F crit. Conclusions: There is a direct relationship of variability in SOD at baseline between two readers to the possibility of disagreement in their end point assessment. Additional rules around selection and measurement of Target Lesions should be proposed in protocol to reduce variability and improve endpoint assessment outcomes.[Table: see text]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.