:
Diagnosing cancer and identifying the disease gene by using DNA microarray gene expression data
are the hot topics in current bioinformatics. This paper is devoted to the latest development of cancer
diagnosis and gene selection via statistical machine learning. Support vector machine is firstly
introduced for the binary cancer diagnosis. Then, 1_norm support vector machine, doubly regularized
support vector machine, adaptive huberized support vector machine and other extensions are
presented to improve the performance of gene selection. Lasso, elastic net, partly adaptive elastic net,
group lasso, sparse group lasso, adaptive sparse group lasso and other sparse regression methods are
also introduced for performing simultaneous binary cancer classification and gene selection. In
addition to introducing three strategies for reducing multiclass to binary, methods of directly
considering all classes of data in a learning model (multi_class support vector, sparse multinomial
regression, adaptive multinomial regression and so on) are presented for performing multiple cancer
diagnosis. Limitations and promising directions are also discussed.
Background:
Cancer threatens human health seriously. Diagnosing cancer via gene expression analysis is the hot topic in cancer research.
Objective:
To diagnose the accurate type of lung cancer and discover the pathogenic genes.
Method:
In this study, affinity propagation (AP) clustering with similarity score is employed to each type of lung cancer and normal lung. After grouping genes, sparse group lasso is adopted to construct four binary classifiers and the voting strategy is used to integrate them.
Results:
This study screens six gene groups that may associate with diffierent lung cancer subtypes among 73 genes groups, and identifies three possible key pathogenic genes, KRAS, BRAF and VDR. Furthermore, this study achieves improved classification accuracies at minority classes SQ and COID in comparison with other four methods.
Conclusion:
We propose the AP clustering based sparse group lasso (AP-SGL), which provides an alternative for simultaneous diagnosis and gene selection for lung cancer.
In view of the challenges of the group Lasso penalty methods for multicancer microarray data analysis, e.g., dividing genes into groups in advance and biological interpretability, we propose a robust adaptive multinomial regression with sparse group Lasso penalty (RAMRSGL) model. By adopting the overlapping clustering strategy, affinity propagation clustering is employed to obtain each cancer gene subtype, which explores the group structure of each cancer subtype and merges the groups of all subtypes. In addition, the data-driven weights based on noise are added to the sparse group Lasso penalty, combining with the multinomial log-likelihood function to perform multiclassification and adaptive group gene selection simultaneously. The experimental results on acute leukemia data verify the effectiveness of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.