An in-depth annotation of the newly discovered coronavirus (2019-nCoV) genome has revealed differences between 2019-nCoV and severe acute respiratory syndrome (SARS) or SARS-like coronaviruses. A systematic comparison identified 380 amino acid substitutions between these coronaviruses, which may have caused functional and pathogenic divergence of 2019-nCoV.
Proteins often interact with each other and form protein complexes to carry out various biochemical activities. Knowledge the interaction sites of are helpful for understanding disease mechanisms and drug design. Accurate prediction of the interaction sites from protein sequences is still a challenging task and severe imbalance data also decreased the performance of computational methods. In this study, we propose to use a deep learning method for improving the imbalanced prediction of protein interaction sites. We develop a new simplified long-short term memory (SLSTM) network to implement a deep learning architecture (named DLPred). To deal with the imbalanced classification in the deep learning model, we explore three new ideas. First, our collection of the training data is to construct a set of protein sequences, instead of a set of just single residues, to retain the entire sequential completeness of each protein. Second, a new penalization factor is appended to the loss function such that the penalization to the non-interaction site loss can be effectively enhanced.Third, multi-task learning of interaction sites and residue solvent accessibility prediction are used for correcting the preference of the prediction model on the non-interaction sites. Our model is evaluated on three public datasets: Dset186, Dtestset72 and PDBtestset164. Compared with current state-of-the-art methods, DLPred is able to significantly improve the predictive accuracies and AUC values while improving the F-measure. The training dataset, test datasets, a standalone version of DLPred and online service are available at http://qianglab.scst.suda.edu.cn/dlp/.
The RNA sequencing approach has been broadly used to provide gene-, pathway-, and network-centric analyses for various cell and tissue samples. However, thus far, rich cellular information carried in tissue samples has not been thoroughly characterized from RNA-Seq data. Therefore, it would expand our horizons to better understand the biological processes of the body by incorporating a cell-centric view of tissue transcriptome. Here, a computational model named seq-ImmuCC was developed to infer the relative proportions of 10 major immune cells in mouse tissues from RNA-Seq data. The performance of seq-ImmuCC was evaluated among multiple computational algorithms, transcriptional platforms, and simulated and experimental datasets. The test results showed its stable performance and superb consistency with experimental observations under different conditions. With seq-ImmuCC, we generated the comprehensive landscape of immune cell compositions in 27 normal mouse tissues and extracted the distinct signatures of immune cell proportion among various tissue types. Furthermore, we quantitatively characterized and compared 18 different types of mouse tumor tissues of distinct cell origins with their immune cell compositions, which provided a comprehensive and informative measurement for the immune microenvironment inside tumor tissues. The online server of seq-ImmuCC are freely available at .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.