As two major parts for tackling high-dimensional cancer microarray gene data sets, feature selection and classification have attracted an increasing interest in academia and medical community. Since cancer gene expression data sets have small samples, high dimensionality, and class imbalance problems, extracting useful gene information and effective classification becomes more challenging. In this paper, we propose a novel feature selection algorithm called ISVM-RFE(FPD) for classification, which fully utilizes classification performance of each feature subset. Compared to the existing algorithms, ISVM-RFE(FPD) takes into account not only the intrinsic characteristic of the data, but also both linear and nonlinear correlation among features. The experimental results demonstrate that ISVM-RFE(FPD) outperforms the existing SVM-based feature selection algorithms in terms of recall rate of positive samples (rr p ) and G-mean (G).
The outbreak of coronavirus disease 2019 (COVID-19) has caused a global disaster, seriously endangering human health and the stability of social order. The purpose of this study is to construct a nonlinear combinational dynamic transmission rate model with automatic selection based on forecasting effective measure (FEM) and support vector regression (SVR) to overcome the shortcomings of the difficulty in accurately estimating the basic infection number R0 and the low accuracy of single model predictions. We apply the model to analyze and predict the COVID-19 outbreak in different countries. First, the discrete values of the dynamic transmission rate are calculated. Second, the prediction abilities of all single models are comprehensively considered, and the best sliding window period is derived. Then, based on FEM, the optimal sub-model is selected, and the prediction results are nonlinearly combined. Finally, a nonlinear combinational dynamic transmission rate model is developed to analyze and predict the COVID-19 epidemic in the United States, Canada, Germany, Italy, France, Spain, South Korea, and Iran in the global pandemic. The experimental results show an the out-of-sample forecasting average error rate lower than 10.07% was achieved by our model, the prediction of COVID-19 epidemic inflection points in most countries shows good agreement with the real data. In addition, our model has good anti-noise ability and stability when dealing with data fluctuations.
This paper proposes a new multi-kernel learning ensemble algorithm, called Ada-L 1 MKL-WSVR, which can be regarded as an extension of multi-kernel learning (MKL) and weighted support vector regression (WSVR). The first novelty is to add the L 1 norm of the weights of the combined kernel function to the objective function of WSVR, which is used to adaptively select the optimal base models and their parameters. In addition, an accelerated method based on fast iterative shrinkage thresholding algorithm (FISTA) is developed to solve the weights of the combined kernel function. The second novelty is to propose an integrated learning framework based on AdaBoost, named Ada-L 1 MKL-WSVR. In this framework, we integrate FISTA into AdaBoost. At each iteration, we optimize the weights of the combined kernel function and update the weights of training samples at the same time. Then an ensemble of a set of regression functions is output. The experimental results show that the effectiveness and reliability of the algorithm in this paper than some other existing methods.
Federated Learning (FL) has emerged as a de facto machine learning area and received rapid increasing research interests from the community. However, catastrophic forgetting caused by data heterogeneity and partial participation poses distinctive challenges for FL, which are detrimental to the performance. To tackle the problems, we propose a new FL approach (namely GradMA), which takes inspiration from continual learning to simultaneously correct the server-side and worker-side update directions as well as take full advantage of server's rich computing and memory resources. Furthermore, we elaborate a memory reduction strategy to enable GradMA to accommodate FL with a large scale of workers. We then analyze convergence of GradMA theoretically under the smooth non-convex setting and show that its convergence rate achieves a linear speed up w.r.t the increasing number of sampled active workers. At last, our extensive experiments on various image classification tasks show that GradMA achieves significant performance gains in accuracy and communication efficiency compared to SOTA baselines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.