Data from patients with coronavirus disease 2019 (COVID-19) are essential for guiding clinical decision making, for furthering the understanding of this viral disease, and for diagnostic modelling. Here, we describe an open resource containing data from 1,521 patients with pneumonia (including COVID-19 pneumonia) consisting of chest computed tomography (CT) images, 130 clinical features (from a range of biochemical and cellular analyses of blood and urine samples) and laboratory-confirmed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) clinical status. We show the utility of the database for prediction of COVID-19 morbidity and mortality outcomes using a deep learning algorithm trained with data from 1,170 patients and 19,685 manually labelled CT slices. In an independent validation cohort of 351 patients, the algorithm discriminated between negative, mild and severe cases with areas under the receiver operating characteristic curve of 0.944, 0.860 and 0.884, respectively. The open database may have further uses in the diagnosis and management of patients with COVID-19.
Here, we presented an integrative database named DrLLPS (http://llps.biocuckoo.cn/) for proteins involved in liquid–liquid phase separation (LLPS), which is a ubiquitous and crucial mechanism for spatiotemporal organization of various biochemical reactions, by creating membraneless organelles (MLOs) in eukaryotic cells. From the literature, we manually collected 150 scaffold proteins that are drivers of LLPS, 987 regulators that contribute in modulating LLPS, and 8148 potential client proteins that might be dispensable for the formation of MLOs, which were then categorized into 40 biomolecular condensates. We searched potential orthologs of these known proteins, and in total DrLLPS contained 437 887 known and potential LLPS-associated proteins in 164 eukaryotes. Furthermore, we carefully annotated LLPS-associated proteins in eight model organisms, by using the knowledge integrated from 110 widely used resources that covered 16 aspects, including protein disordered regions, domain annotations, post-translational modifications (PTMs), genetic variations, cancer mutations, molecular interactions, disease-associated information, drug-target relations, physicochemical property, protein functional annotations, protein expressions/proteomics, protein 3D structures, subcellular localizations, mRNA expressions, DNA & RNA elements, and DNA methylations. We anticipate DrLLPS can serve as a helpful resource for further analysis of LLPS.
As an important reversible lipid modification, S-palmitoylation mainly occurs at specific cysteine residues in proteins, participates in regulating various biological processes and is associated with human diseases. Besides experimental assays, computational prediction of S-palmitoylation sites can efficiently generate helpful candidates for further experimental consideration. Here, we reviewed the current progress in the development of S-palmitoylation site predictors, as well as training data sets, informative features and algorithms used in these tools. Then, we compiled a benchmark data set containing 3098 known S-palmitoylation sites identified from small- or large-scale experiments, and developed a new method named data quality discrimination (DQD) to distinguish data quality weights (DQWs) between the two types of the sites. Besides DQD and our previous methods, we encoded sequence similarity values into images, constructed a deep learning framework of convolutional neural networks (CNNs) and developed a novel algorithm of graphic presentation system (GPS) 6.0. We further integrated nine additional types of sequence-based and structural features, implemented parallel CNNs (pCNNs) and designed a new predictor called GPS-Palm. Compared with other existing tools, GPS-Palm showed a >31.3% improvement of the area under the curve (AUC) value (0.855 versus 0.651) for general prediction of S-palmitoylation sites. We also produced two species-specific predictors, with corresponding AUC values of 0.900 and 0.897 for predicting human- and mouse-specific sites, respectively. GPS-Palm is free for academic research at http://gpspalm.biocuckoo.cn/.
Drug combinations are frequently used for the treatment of cancer patients in order to increase efficacy, decrease adverse side effects, or overcome drug resistance. Given the enormous number of drug combinations, it is cost-and time-consuming to screen all possible drug pairs experimentally. Currently, it has not been fully explored to integrate multiple networks to predict synergistic drug combinations using recently developed deep learning technologies. In this study, we proposed a Graph Convolutional Network (GCN) model to predict synergistic drug combinations in particular cancer cell lines. Specifically, the GCN method used a convolutional neural network model to do heterogeneous graph embedding, and thus solved a link prediction task.The graph in this study was a multimodal graph, which was constructed by integrating the drugdrug combination, drug-protein interaction, and protein-protein interaction networks. We found that the GCN model was able to correctly predict cell line-specific synergistic drug combinations from a large heterogonous network. The majority (30) of the 39 cell line-specific models show an area under the receiver operational characteristic curve (AUC) larger than 0.80, resulting in a mean AUC of 0.84. Moreover, we conducted an in-depth literature survey to investigate the top predicted drug combinations in specific cancer cell lines and found that many of them have been found to show synergistic antitumor activity against the same or other cancers in vitro or in vivo. Taken together, the results indicate that our study provides a promising way to better predict and optimize synergistic drug pairs in silico.
As an important protein acylation modification, lysine succinylation (Ksucc) is involved in diverse biological processes, and participates in human tumorigenesis. Here, we collected 26,243 non-redundant known Ksucc sites from 13 species as the benchmark data set, combined 10 types of informative features, and implemented a hybrid-learning architecture by integrating deep-learning and conventional machine-learning algorithms into a single framework. We constructed a new tool named HybridSucc, which achieved area under curve (AUC) values of 0.885 and 0.952 for general and human-specific prediction of Ksucc sites, respectively. In comparison, the accuracy of HybridSucc was 17.84%–50.62% better than that of other existing tools. Using HybridSucc, we conducted a proteome-wide prediction and prioritized 370 cancer mutations that change Ksucc states of 218 important proteins, including PKM2, SHMT2, and IDH2. We not only developed a high-profile tool for predicting Ksucc sites, but also generated useful candidates for further experimental consideration. The online service of HybridSucc can be freely accessed for academic research at http://hybridsucc.biocuckoo.org/.
The outbreak of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was initially reported in Wuhan, China since December, 2019. Here, we reported a timely and comprehensive resource named iCTCF to archive 256,356 chest computed tomography (CT) images, 127 types of clinical features (CFs), and laboratory-confirmed SARS-CoV-2 clinical status from 1170 patients, reaching a data volume of 38.2 GB. To facilitate COVID-19 diagnosis, we integrated the heterogeneous CT and CF datasets, and developed a novel framework of Hybrid-learning for UnbiaSed predicTion of COVID-19 patients (HUST-19) to predict negative cases, mild/regular and severe/critically ill patients, respectively. Although both CT images and CFs are informative in predicting patients with or without COVID-19 pneumonia, the integration of CT and CF datasets achieved a striking accuracy with an area under the curve (AUC) value of 0.978, much higher than that when exclusively using either CT (0.919) or CF data (0.882). Together with HUST- 19, iCTCF can serve as a fundamental resource for improving the diagnosis and management of COVID-19 patients.Authors Wanshan Ning, Shijun Lei, Jingjing Yang, and Yukun Cao contributed equally to this work.
Motivation Combination therapies have been widely used to treat cancers. However, it is cost- and time-consuming to experimentally screen synergistic drug pairs due to the enormous number of possible drug combinations. Thus, computational methods have become an important way to predict and prioritize synergistic drug pairs. Results We proposed a Deep Tensor Factorization (DTF) model, which integrated a tensor factorization method and a deep neural network (DNN), to predict drug synergy. The former extracts latent features from drug synergy information while the latter constructs a binary classifier to predict the drug synergy status. Compared to the tensor-based method, the DTF model performed better in predicting drug synergy. The area under precision-recall curve (PR AUC) was 0.58 for DTF and 0.24 for the tensor method. We also compared the DTF model with DeepSynergy and logistic regression models, and found that the DTF outperformed the logistic regression model and achieved similar performance as DeepSynergy using several performance metrics for classification task. Applying the DTF model to predict missing entries in our drug-cell line tensor, we identified novel synergistic drug combinations for 10 cell lines from the 5 cancer types. A literature survey showed that some of these predicted drug synergies have been identified in vivo or in vitro. Thus, the DTF model could be a valuable in silico tool for prioritizing novel synergistic drug combinations. Availability Source code and data is available at https://github.com/ZexuanSun/DTF-Drug-Synergy Supplementary information Supplementary data are available at Bioinformatics online.
Artificial intelligence (AI)-based drug design has great promise to fundamentally change the landscape of the pharmaceutical industry. Even though there are great progress from handcrafted feature-based machine learning models, 3D convolutional neural networks (CNNs) and graph neural networks, effective and efficient representations that characterize the structural, physical, chemical and biological properties of molecular structures and interactions remain to be a great challenge. Here, we propose an equal-sized molecular 2D image representation, known as the molecular persistent spectral image (Mol-PSI), and combine it with CNN model for AI-based drug design. Mol-PSI provides a unique one-to-one image representation for molecular structures and interactions. In general, deep models are empowered to achieve better performance with systematically organized representations in image format. A well-designed parallel CNN architecture for adapting Mol-PSIs is developed for protein–ligand binding affinity prediction. Our results, for the three most commonly used databases, including PDBbind-v2007, PDBbind-v2013 and PDBbind-v2016, are better than all traditional machine learning models, as far as we know. Our Mol-PSI model provides a powerful molecular representation that can be widely used in AI-based drug design and molecular data analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.