Protein subcellular localization plays a crucial role in characterizing the function of proteins and understanding various cellular processes. Therefore, accurate identification of protein subcellular location is an important yet challenging task. Numerous computational methods have been proposed to predict the subcellular location of proteins. However, most existing methods have limited capability in terms of the overall accuracy, time consumption and generalization power. To address these problems, in this study, we developed a novel computational approach based on human protein atlas (HPA) data, referred to as PScL-HDeep, for accurate and efficient image-based prediction of protein subcellular location in human tissues. We extracted different handcrafted and deep learned (by employing pretrained deep learning model) features from different viewpoints of the image. The step-wise discriminant analysis (SDA) algorithm was applied to generate the optimal feature set from each original raw feature set. To further obtain a more informative feature subset, support vector machine–based recursive feature elimination with correlation bias reduction (SVM-RFE + CBR) feature selection algorithm was applied to the integrated feature set. Finally, the classification models, namely support vector machine with radial basis function (SVM-RBF) and support vector machine with linear kernel (SVM-LNR), were learned on the final selected feature set. To evaluate the performance of the proposed method, a new gold standard benchmark training dataset was constructed from the HPA databank. PScL-HDeep achieved the maximum performance on 10-fold cross validation test on this dataset and showed a better efficacy over existing predictors. Furthermore, we also illustrated the generalization ability of the proposed method by conducting a stringent independent validation test.
Motivation Characterization of protein subcellular localization has become an important and long-standing task in bioinformatics and computational biology, which provides valuable information for elucidating various cellular functions of proteins and guiding drug design. Results Here, we develop a novel bioimage-based computational approach, termed PScL-DDCFPred, to accurately predict protein subcellular localizations in human tissues. PScL-DDCFPred first extracts multiview image features, including global and local features, as base or pure features; Next, it applies a new integrative feature selection method based on stepwise discriminant analysis and generalized discriminant analysis to identify the optimal feature sets from the extracted pure features; Finally, a classifier based on deep neural network (DNN) and deep-cascade forest (DCF) is established. Stringent ten-fold cross-validation tests on the new protein subcellular localization training dataset, constructed from the human protein atlas databank, illustrates that PScL-DDCFPred achieves a better performance than several existing state-of-the-art methods. Moreover, the independent test set further illustrates the generalization capability and superiority of PScL-DDCFPred over existing predictors. In-depth analysis shows that the excellent performance of PScL-DDCFPred can be attributed to three critical factors, namely the effective combination of the DNN and DCF models, complementarity of global and local features, and use of the optimal feature sets selected by the integrative feature selection algorithm. Availability https://github.com/csbio-njust-edu/PScL-DDCFPred Supplementary information Supplementary data are available at Bioinformatics online.
Hyperspectral unmixing (HSU) is a crucial method to determine the fractional abundance of the material (endmembers) in each pixel. Most spectral unmixing methods are affected by low signal-to-noise ratios because of noisy pixels and bands simultaneously, requiring robust HSU techniques that exploit both 3D (spectral–spatial dimension) and 2D (spatial dimension) domains. In this paper, we present a new method for robust supervised HSU based on a deep hybrid (3D and 2D) convolutional autoencoder (DHCAE) network. Most HSU methods adopt the 2D model for simplicity, whereas the performance of HSU depends on spectral and spatial information. The DHCAE network exploits spectral and spatial information of the remote sensing images for abundance map estimation. In addition, DHCAE uses dropout to regularize the network for smooth learning and to avoid overfitting. Quantitative and qualitative results confirm that our proposed DHCAE network achieved better hyperspectral unmixing performance on synthetic and three real hyperspectral images, i.e., Jasper Ridge, urban and Washington DC Mall datasets.
Motivation Over the past decades, a variety of in silico methods have been developed to predict protein subcellular localization within cells. However, a common and major challenge in the design and development of such methods is how to effectively utilize the heterogenous feature sets extracted from bioimages. In this regards, limited efforts have been undertaken. Results We propose a new two-level stacked autoencoder network (termed 2 L-SAE-SM) to improve its performance by integrating the heterogenous feature sets. In particular, in the 1st-level of 2 L-SAE-SM, each optimal heterogeneous feature set is fed to train our designed stacked autoencoder network (SAE-SM). All the trained SAE-SMs in the 1st-level can output the decision sets based on their respective optimal heterogeneous feature sets, known as ‘intermediate decision’ sets. Such intermediate decision sets are then ensembled using the mean ensemble (ME) method to generate the ‘intermediate feature’ set for the 2nd-level SAE-SM. Using the proposed framework, we further develop a novel predictor, referred to as PScL-2LSAESM, to characterize image-based protein subcellular localization. Extensive benchmarking experiments on the latest benchmark training and independent test datasets collected from the human protein atlas databank demonstrate the effectiveness of the proposed 2 L-SAE-SM framework for the integration of heterogenous feature sets. Moreover, performance comparison of the proposed PScL-2LSAESM with current state-of-the-art methods further illustrates that PScL-2LSAESM clearly outperforms the existing state-of-the-art methods for the task of protein subcellular localization. Availability https://github.com/csbio-njust-edu/PScL-2LSAESM Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.