In high-dimensional data, the performances of various classifiers are largely dependent on the selection of important features. Most of the individual classifiers with the existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important features using the FS method and selecting the best performing classifier is a challenging task in high throughput data. In this article, we propose a combination of resampling-based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS) and ensembles of regularized regression (ERRM) capable of dealing data with the high correlation structures. The ERRM boosts the prediction accuracy with the top-ranked features obtained from RLFS. The RLFS utilizes the lasso penalty with sure independence screening (SIS) condition to select the top k ranked features. The ERRM includes five individual penalty based classifiers: LASSO, adaptive LASSO (ALASSO), elastic net (ENET), smoothly clipped absolute deviations (SCAD), and minimax concave penalty (MCP). It was built on the idea of bagging and rank aggregation. Upon performing simulation studies and applying to smokers’ cancer gene expression data, we demonstrated that the proposed combination of ERRM with RLFS achieved superior performance of accuracy and geometric mean.
Dementia is a degenerative disease that is increasingly prevalent in an aging society. Alzheimer’s disease (AD), the most common type of dementia, is best mitigated via early detection and management. Deep learning is an artificial intelligence technique that has been used to diagnose and predict diseases by extracting meaningful features from medical images. The convolutional neural network (CNN) is a representative application of deep learning, serving as a powerful tool for the diagnosis of AD. Recently, vision transformers (ViT) have yielded classification performance exceeding that of CNN in some diagnostic image classifications. Because the brain is a very complex network with interrelated regions, ViT, which captures direct relationships between images, may be more effective for brain image analysis than CNN. Therefore, we propose a method for classifying dementia images by applying 18F-Florbetaben positron emission tomography (PET) images to ViT. Data were evaluated via binary (normal control and abnormal) and ternary (healthy control, mild cognitive impairment, and AD) classification. In a performance comparison with the CNN, VGG19 was selected as the comparison model. Consequently, ViT yielded more effective performance than VGG19 in binary classification. However, in ternary classification, the performance of ViT cannot be considered excellent. These results show that it is hard to argue that the ViT model is better at AD classification than the CNN model.
This paper introduces methodologies in forecasting oil prices (Brent and WTI) with multivariate time series of major S&P 500 stock prices using Gaussian process modeling, deep learning, and vine copula regression. We also apply Bayesian variable selection and nonlinear principal component analysis (NLPCA) for data dimension reduction. With a reduced number of important covariates, we also forecast oil prices (Brent and WTI) with multivariate time series of major S&P 500 stock prices using Gaussian process modeling, deep learning, and vine copula regression. To apply real data to the proposed methods, we select monthly log returns of 2 oil prices and 74 large-cap, major S&P 500 stock prices across the period of February 2001–October 2019. We conclude that vine copula regression with NLPCA is superior overall to other proposed methods in terms of the measures of prediction errors.
The high-throughput correlated DNA methylation (DNAmeth) dataset generated from Illumina Infinium Human Methylation 27 (IIHM 27K) BeadChip assay. In the DNAmeth data, there are several CpG sites for every gene, and these grouped CpG sites are highly correlated. Most of the current filtering-based ranking (FBR) methods do not consider the group correlation structures. Obtaining the significant features with the FBR methods and applying these features to the classifiers to attain the best classification accuracy in highly correlated DNAmeth data is a challenging task. In this research, we introduce a resampling of group least absolute shrinkage and selection operator (glasso) FBR method capable of ignoring the unrelated features in the data considering the group correlation among the features. The various classifiers, such as random forests (RF), Naive Bayes (NB), and support vector machines (SVM) with the significant CpGs obtained from the proposed resampling of group lasso-based ranking (RGLR) method helped to boost the classification accuracy. Through simulated and experimental prostate DNAmeth data, we showed that higher performance of accuracy, sensitivity, specificity, and geometric mean is achieved by ignoring the unimportant CpG sites through the RGLR method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.