Centrosome amplification is a hallmark of many types of cancer cells, and clustering of multiple centrosomes is critical for cancer cell survival and proliferation. Human kinesin-14 HSET/KFIC1 is essential for centrosome clustering, and its inhibition leads to the specific killing of cancer cells with extra centrosomes. Since kinesin-14 motor domains are conserved evolutionarily, we conceived a strategy of obtaining kinesin-14 inhibitors using Aspergillus nidulans, based on the previous result that loss of the kinesin-14 KlpA rescues the non-viability of the bimC4 kinesin-5 mutant at 42°C. However, it was unclear whether alteration of BimC or any other non-KlpA protein would be a major factor reversing the lethality of the bimC4 mutant. Here we performed a genome-wide screen for bimC4 suppressors and obtained fifteen suppressor strains. None of the suppressor mutations maps to bimC. The vast majority of them contain mutations in the klpA gene, most of which are missense mutations affecting the C-terminal motor domain. Our study confirms that the bimC4 mutant is suitable for a cell-based screen for chemical inhibitors of kinesin-14. Since the selection is based on enhanced growth rather than diminished growth, cytotoxic compounds can be excluded.
PURPOSE
Cancer registries are important sources of real-world data (RWD) that reveal insights into practice patterns and cancer patient outcomes, but the prevalence of missing data can be high. Machine learning (ML) imputation methods can be applied to large RWD sets, but the performance of these approaches within cancer registries is unclear.
METHODS
We identified non-small cell lung cancer (NSCLC) patients within the National Cancer Database diagnosed in 2014 with complete data in 19 variables of known clinical and prognostic significance. We generated synthetic missing data for each variable, then performed imputation using substitution (control) and five different ML approaches. Imputation efficacy was measured by normalized root-mean-square error (RMSE) for continuous variables and proportion of falsely classified entries (PFC) for categorical variables. We also measured algorithm runtimes and the impact of incorporating imputed values on survival modeling.
RESULTS
50,790 NSCLC patients were included for this study, with 81 features for each patient after data preprocessing. Among the tested ML methods, SoftImpute had the lowest RMSE (best performance) for continuous variables ranging from 0.071 to 0.080 for 10% to 50% missing data, and MissForest had the lowest PFC (best performance) for categorical variables ranging from 0.251 to 0.311 for 10 to 50% missing data. SoftImpute had a runtime of 3.28x10-4 seconds per patient record, and MissForest averaged 2.96x10-3 seconds per patient record. Deep learning imputation using a denoising autoencoder did not achieve improved performance despite higher algorithm runtimes. Cox models incorporating ML imputed data achieved similar C-index ranging from 0.787 to 0.801 for all ML methods tested.
CONCLUSION
ML imputation achieved promising performance for NSCLC patients within a large national cancer registry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.