Protein-peptide interaction is crucial for many cellular processes. It is difficult to determine the interaction by experiments as peptides are often very flexible in structure. Accurate sequence-based prediction of peptide-binding residues can facilitate the study of this interaction. In this work, we developed two novel sequence-based methods SVMpep and PepBind to identify the peptide-binding residues. Recent studies demonstrate that the protein-peptide binding is closely associated with intrinsic disorder. We thus introduced intrinsic disorder in our feature design and developed the ab initio method SVMpep. Experiments show that intrinsic disorder contributes to 1.2-5.2% improvement in area under the receiver operating characteristic curve (AUC). Comparison to the recent sequence-based method SPRINT-Seq reveals that SVMpep improves the AUC and Matthews correlation coefficient (MCC) by at least 7.7% and 70%, respectively. In addition, by combining SVMpep with two template-based methods S-SITE and TM-SITE, we next proposed the consensus-based method PepBind. Remarkably, compared with the latest structure-based method SPRINT-Str, PepBind improves the AUC and MCC by 1.7% and 28.3%, respectively, on the same independent test set of SPRINT-Str. The success of PepBind is attributed to the improved prediction of the ab initio method SVMpep by introducing intrinsic disorder and the consensus prediction by combining three complementary methods. A web server that implements the proposed methods is freely available at http://yanglab.nankai.edu.cn/PepBind/ .
At the end of 2019, the COVID-19 virus spread worldwide, infecting millions of people. Infectious diseases induced by pathogenic microorganisms such as influenza virus, hepatitis virus, and Mycobacterium tuberculosis are...
BackgroundLung cancer is one of the most common types of cancer, among which lung adenocarcinoma accounts for the largest proportion. Currently, accurate staging is a prerequisite for effective diagnosis and treatment of lung adenocarcinoma. Previous research has used mainly single-modal data, such as gene expression data, for classification and prediction. Integrating multi-modal genetic data (gene expression RNA-seq, methylation data and copy number variation) from the same patient provides the possibility of using multi-modal genetic data for cancer prediction. A new machine learning method called gcForest has recently been proposed. This method has been proven to be suitable for classification in some fields. However, the model may face challenges when applied to small samples and high-dimensional genetic data.ResultsIn this paper, we propose a multi-weighted gcForest algorithm (MLW-gcForest) to construct a lung adenocarcinoma staging model using multi-modal genetic data. The new algorithm is based on the standard gcForest algorithm. First, different weights are assigned to different random forests according to the classification performance of these forests in the standard gcForest model. Second, because the feature vectors generated under different scanning granularities have a diverse influence on the final classification result, the feature vectors are given weights according to the proposed sorting optimization algorithm. Then, we train three MLW-gcForest models based on three single-modal datasets (gene expression RNA-seq, methylation data, and copy number variation) and then perform decision fusion to stage lung adenocarcinoma. Experimental results suggest that the MLW-gcForest model is superior to the standard gcForest model in constructing a staging model of lung adenocarcinoma and is better than the traditional classification methods. The accuracy, precision, recall, and AUC reached 0.908, 0.896, 0.882, and 0.96, respectively.ConclusionsThe MLW-gcForest model has great potential in lung adenocarcinoma staging, which is helpful for the diagnosis and personalized treatment of lung adenocarcinoma. The results suggest that the MLW-gcForest algorithm is effective on multi-modal genetic data, which consist of small samples and are high dimensional.
Deep convolution neural network (DCNN) technology has achieved great success in extracting buildings from aerial images. However, the current mainstream algorithms are not satisfactory in feature extraction and classification of homesteads, especially in complex rural scenarios. This study proposes a deep convolutional neural network for rural homestead extraction consisting of a detail branch, a semantic branch, and a boundary branch, namely Multi-Branch Network (MBNet). Meanwhile, a multi-task joint loss function is designed to constrain the consistency of bounds and masks with their respective labels. Specifically, MBNet guarantees the details of prediction through serial 4× down-sampled high-resolution feature maps and adds a mixed-scale spatial attention module at the tail of the semantic branch to obtain multi-scale affinity features. At the same time, the low-resolution semantic feature maps and interaction between high-resolution detail feature maps are maintained. Finally, the result of semantic segmentation is refined by the point-to-point module (PTPM) through the generated boundary. Experiments on UAV high-resolution imagery in rural areas show that our method achieves better performance than other state-of-the-art models, which helps to refine the extraction of rural homesteads. This study demonstrates that MBNet is a potential candidate for building an automatic rural homestead management system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.