6556 Background: Survival prediction models for lung cancer patients could help guide their care and therapy decisions. The objectives of this study were to predict probability of survival beyond 90, 180 and 360 days from any point in a lung cancer patient’s journey. Methods: We developed a Gradient Boosting model (XGBoost) using data from 55k lung cancer patients in the ASCO CancerLinQ database that used 3958 unique variables including Dx and Rx codes, biomarkers, surgeries and lab tests from ≤1 year prior to the prediction point, which was chosen at random for each patient. We used 40% data for training, 25% for hyper-parameter tuning, 20% for testing and 15% for holdout validation. Death date available in the Electronic Health Record was cross checked by linkage to death registries. Results: The model was validated on the holdout set of 8,468 patients. The Area Under the Curve (AUC) for the model was 0.79. The precision and recall for predicting survival beyond the three time points were between 0.7-0.8 and 0.8-0.9 respectively (see table). This compares favourably to other lung cancer survival models created using different machine learning techniques (Jochems 2017, Dekker 2009). A Cox-PH model created using the top 20 variables also had a significantly lower performance (see table). Analysis of input variables yielded distinctive patterns for patient subgroups and time points. Tumor status, medications, lab values and functional status were found to be significant in patient sub cohorts. Conclusions: An AI model to predict survival of lung cancer patients built using a large real world dataset yielded high accuracy. This general model can further be used to predict survival of sub cohorts stratified by variables such as stage or various treatment effects. Such a model could be useful for assessing patient risk and treatment options, evaluating cost and quality of care or determining clinical trial eligibility. [Table: see text]
impact of varying adherence rates on the relative benefits of FIT and mt-sDNA screening. Methods: Sensitivity and specificity from DeeP-C trial data were used for screening inputs. Predicted outcomes of annual FIT and triennial mt-sDNA were simulated for individuals born in 1975 who were free of diagnosed CRC at age 40 and screened between ages 50-75. Adherence was set by assuming a fixed annual likelihood to comply ranging from 0-100%, in 10% increments. It was assumed that patients were offered a stool-based screening test yearly unless they were not due for screening. Predicted outcomes are per 1000 individuals versus no screening. Results: Each screening strategy yielded higher life-years gained (LYG) versus no screening. At perfect adherence, mt-sDNA resulted in 4.1% fewer LYG (LYG=302.2; colonoscopies=1856) versus FIT (LYG=315.2; colonoscopies =1915). At imperfect adherence rates of 70% for triennial mt-sDNA and 40% for annual FIT, mt-sDNA resulted in a 19.1% increase in LYG (288.9; colonoscopies=1724) versus FIT (242.5; colonoscopies=1218). LYG for FIT was more sensitive to per-unit change in adherence rates ([315.22101.2]/ [100%210%]=2.4 LYG/unit change) than mt-sDNA (1.8 LYG/unit change). At equivalent adherence, mt-sDNA generally resulted in higher colonoscopies and lower stool testing vs FIT. Conclusions: Stool-based CRC screening provides higher LYG vs no screening, regardless of adherence assumptions. The comparative effectiveness of FIT versus mt-sDNA screening changes dramatically when assuming adherence is ,100%, with mt-sDNA outperforming FIT under adherence assumptions that are more consistent with available, although currently incomplete, real-world evidence.
e21596 Background: There are ongoing efforts to understand and predict exceptional response to existing cancer therapies, but few clinical characteristics of these patients are known. We trained a machine learning model using the Concerto HealthAI database of oncology EMR data that includes clinical data from CancerLinQ Discovery to predict slow progression, a proxy for exceptional response, in aNSCLC in the second line setting. Methods: We trained an XGBoost model to predict patients with a progression free survival (PFS) greater than 180 days from the start of second line therapy (index date). This cutoff approximately determines the top 20% of PFS values in our database (median PFS = 86 days). Patients were included from the study if they (1) were pathologically confirmed aNSCLC without other primary cancer diagnoses and (2) started their second-line therapy between 2013 and 2017. Patients were labeled as slow progressors if they (1) had no evidence of progression or death within 180 days of index and (2) were evaluated for progression for at least 180 days post-index. The model considered data up to 120 days prior to index date. Risk factors in the model included demographics, vitals, common labs, common medical conditions, ECOG performance status, stage, histology, prior cancer treatment patterns, prior progression/response assessments, and medication history. Feature importance was evaluated using SHapley Additive exPlanations (SHAP). Results: 2205 patients met selection criteria of the study. Of these, 420 were labeled as slow progressors. 1776 patients were used for model training and 429 were set aside for model validation. The final model was able to predict slow progression with an AUCROC of 0.75 (F-score 0.48, precision 0.39, recall 0.6). The performance compares favorably to that of a logistic regression model (0.66 AUCROC). Top features that indicated slow progression included a low number of prior progression events or regimens, absence of metastatic disease, lower stage/t-stage/ECOG, absence of COPD, previous treatment with an EGFR inhibitor, normal Alk-Phos/WBC (versus elevated), absence of tachycardia, and a normal BMI (versus low). Conclusions: Machine learning and real world-data provided promising results in predicting slow progression in aNSCLC and may be useful in discovering novel drivers of favorable response.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.