With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.
With the advent of advances in selfsupervised learning, paired clean-noisy data are no longer required in deep learning-based image denoising. However, existing blind denoising methods still require the assumption with regard to noise characteristics, such as zero-mean noise distribution and pixel-wise noise-signal independence; this hinders wide adaptation of the method in the medical domain. On the other hand, unpaired learning can overcome limitations related to the assumption on noise characteristics, which makes it more feasible for collecting the training data in real-world scenarios. In this paper, we propose a novel image denoising scheme, Interdependent Self-Cooperative Learning (ISCL), that leverages unpaired learning by combining cyclic adversarial learning with self-supervised residual learning. Unlike the existing unpaired image denoising methods relying on matching data distributions in different domains, the two architectures in ISCL, designed for different tasks, complement each other and boost the learning process. To assess the performance of the proposed method, we conducted extensive experiments in various biomedical image degradation scenarios, such as noise caused by physical characteristics of electron microscopy (EM) devices (film and charging noise), and structural noise found in low-dose computer tomography (CT). We demonstrate that the image quality of our method is superior to conventional and current state-of-the-art deep learning-based unpaired image denoising methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.