Objective-Overall survival (OS) for advanced stage (IIIA-IV) non-small cell lung cancer is highly variable, and retrospective data show a survival advantage for patients receiving therapeutic intent pulmonary resection. We hypothesized that this variability in OS can be modeled separately by stage to allow a personalized estimate of OS. Methods-In a cohort of advanced stage NSCLC patients from the National Cancer Database, we assessed the accuracy of Surgical Selection Score (SSS) to predict OS using Cox proportional hazards models, and determined by stage, the effect of surgery on survival among people with similar high levels of SSS. Results-300,572 patients were identified; 18,701 (6%) had surgery. The SSS was a strong predictor of OS (C-index, 0.89; 95% CI, 0.89-0.90). We observed significantly higher OS (p<0.001) among patients who had surgery. The hazard of death was at least 2 times higher for patients in the upper quartile of SSS that did not receive surgery compared to surgical patients even when adjusting for the SSS (Stage IIIA: Hazard Ratio (HR) 2.
The performance of Partial Least Squares regression (PLS) in predicting the output with multivariate cross-and autocorrelated data is studied. With many correlated predictors of varying importance PLS does not always predict well and we propose a modified algorithm, Partitioned Partial Least Squares (PPLS). In PPLS the predictors are partitioned into smaller subgroups and the important subgroups with high prediction power are identified. Finally, regular PLS analysis using only those subgroups is performed. The proposed Partitioned PLS (PPLS) algorithm is used in the analysis of data from a real pharmaceutical batch fermentation process for which the process variables follow certain profiles during a specific fermentation period. We observed that PPLS leads to a more accurate prediction of the yield of the fermentation process and an easier interpretation, since fewer predictors are used in the final PLS prediction. In the application important issues such as alignment of the profiles from one batch to another and standardization of the predictors are also addressed. For instance, in PPLS noise magnification due to standardization does not seem to create problems as it might in regular PLS. Finally, PPLS is compared to several recently proposed functional PLS and PCR methods and a genetic algorithm for variable selection. More specifically for a couple of publicly available data sets with near infrared spectra it is shown that overall PPLS has lower cross-validated error than PLS, PCR and the functional modifications hereof, and is similar in performance to a more complex genetic algorithm.
Summary. An automated approach to extract interpretable features of univariate or multivariate profiles (functional data) is proposed. A landmark alignment algorithm is modified and the alignment is combined with piecewise linear approximations. Least absolute shrinkage and selection operator (lasso) regression is used for selecting the most important intercepts and slopes and yields an alternative to partial least squares to model a response associated with the profiles. Latent variables can be difficult to interpret but our extracted features simply correspond to slopes and intercepts of particular parts of the profiles. Also, features that relate to the degree of warping between a given profile and a reference can be extracted as predictors. Selection criteria for the number of knots and common knot locations between profiles are developed. We apply our proposed method to batch fermentation data where the profiles consist of on‐line measurements of process variables and the corresponding yield of the process. The extracted features have good interpretability (with large dimensional reduction) and in combination with the lasso have prediction accuracy which is comparable with that of partial least squares applied to the original profiles. Also our proposed feature extraction method is applied to publicly available data where near infrared spectra define the profiles and the prediction accuracy of our feature lasso method is comparable with those of more complicated alternatives.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.