We investigated the sensitivity of regional tumor response prediction to variability in voxel clustering techniques, imaging features, and machine learning algorithms in 25 patients with locally advanced non-small cell lung cancer (LA-NSCLC) enrolled on the FLARE-RT clinical trial. Metabolic tumor volumes (MTV) from pre-chemoradiation (PETpre) and midchemoradiation FDG-PET images (PETmid) were subdivided into K-means or hierarchical voxel clusters by standardized uptake values (SUV) and 3D-positions. MTV cluster separability was evaluated by CH index, and morphologic changes were captured by Dice similarity and centroid Euclidean distance. PETpre conventional features included SUVmean, MTV / MTV cluster size, and mean radiation dose. PETpre radiomics consisted of 41 intensity histogram and 3D texture features (PET Oncology Radiomics Test Suite) extracted from MTV or MTV clusters. Machine learning models (multiple linear regression, support vector regression, logistic regression, support vector machines) of conventional features or radiomic features were constructed to predict PETmid response. Leave-one-out-cross-validated root-mean-squared-error (RMSE) for continuous response regression (ΔSUVmean) and area-under-receiver-operating-characteristic-curve (AUC) for binary response classification were calculated. K-means MTV 2-clusters (MTVhi, MTVlo) achieved maximum CH index separability (Friedman p<0.001). Between PETpre and PETmid, *
Machine learning methods have helped to advance wide range of scientific and technological field in recent years, including computational chemistry. As the chemical systems could become complex with high dimension, feature selection could be critical but challenging to develop reliable machine learning based prediction models, especially for proteins as bio‐macromolecules. In this study, we applied sparse group lasso (SGL) method as a general feature selection method to develop classification model for an allosteric protein in different functional states. This results into a much improved model with comparable accuracy (Acc) and only 28 selected features comparing to 289 selected features from a previous study. The Acc achieves 91.50% with 1936 selected feature, which is far higher than that of baseline methods. In addition, grouping protein amino acids into secondary structures provides additional interpretability of the selected features. The selected features are verified as associated with key allosteric residues through comparison with both experimental and computational works about the model protein, and demonstrate the effectiveness and necessity of applying rigorous feature selection and evaluation methods on complex chemical systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.