With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
Supplementary data are available at Bioinformatics online.
Motivation Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the ‘life and death’ cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases’ functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events. Results We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites. Availability and implementation The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/. Supplementary information Supplementary data are available at Bioinformatics online.
Glycosylation plays an important role in cell-cell adhesion, ligand-binding and subcellular recognition. Current approaches for predicting protein glycosylation are primarily based on sequence-derived features, while little work has been done to systematically assess the importance of structural features to glycosylation prediction. Here, we propose a novel bioinformatics method called GlycoMinestruct(http://glycomine.erc.monash.edu/Lab/GlycoMine_Struct/) for improved prediction of human N- and O-linked glycosylation sites by combining sequence and structural features in an integrated computational framework with a two-step feature-selection strategy. Experiments indicated that GlycoMinestruct outperformed NGlycPred, the only predictor that incorporated both sequence and structure features, achieving AUC values of 0.941 and 0.922 for N- and O-linked glycosylation, respectively, on an independent test dataset. We applied GlycoMinestruct to screen the human structural proteome and obtained high-confidence predictions for N- and O-linked glycosylation sites. GlycoMinestruct can be used as a powerful tool to expedite the discovery of glycosylation events and substrates to facilitate hypothesis-driven experimental studies.
The roles of proteolytic cleavage have been intensively investigated and discussed during the past two decades. This irreversible chemical process has been frequently reported to influence a number of crucial biological processes (BPs), such as cell cycle, protein regulation and inflammation. A number of advanced studies have been published aiming at deciphering the mechanisms of proteolytic cleavage. Given its significance and the large number of functionally enriched substrates targeted by specific proteases, many computational approaches have been established for accurate prediction of protease-specific substrates and their cleavage sites. Consequently, there is an urgent need to systematically assess the state-of-the-art computational approaches for protease-specific cleavage site prediction to further advance the existing methodologies and to improve the prediction performance. With this goal in mind, in this article, we carefully evaluated a total of 19 computational methods (including 8 scoring function-based methods and 11 machine learning-based methods) in terms of their underlying algorithm, calculated features, performance evaluation and software usability. Then, extensive independent tests were performed to assess the robustness and scalability of the reviewed methods using our carefully prepared independent test data sets with 3641 cleavage sites (specific to 10 proteases). The comparative experimental results demonstrate that PROSPERous is the most accurate generic method for predicting eight protease-specific cleavage sites, while GPS-CCD and LabCaS outperformed other predictors for calpain-specific cleavage sites. Based on our review, we then outlined some potential ways to improve the prediction performance and ease the computational burden by applying ensemble learning, deep learning, positive unlabeled learning and parallel and distributed computing techniques. We anticipate that our study will serve as a practical and useful guide for interested readers to further advance next-generation bioinformatics tools for protease-specific cleavage site prediction.
Purpose: To measure the prevalence of medically actionable pathogenic variants (PVs) among a population of healthy elderly individuals. Methods: We used targeted sequencing to detect ‘pathogenic’ or ‘likely pathogenic’ variants in 55 genes associated with autosomal dominant medically actionable conditions, among a population of 13,131 individuals aged 70 or older (mean age 75 years) enrolled in the ASPirin in Reducing Events in the Elderly (ASPREE) trial. Participants had no previous diagnosis or current symptoms of cardiovascular disease, physical disability or dementia, and no current diagnosis of life-threatening cancer. Variant curation followed ACMG/AMP standards. Results: One in 75 (1.3%) healthy elderly individuals carried a PV. This was lower than rates reported from population-based studies, which have ranged from 1.8% to 3.4%. We detected 20 PV carriers for Lynch syndrome ( MSH6/MLH1/MSH2/PMS2 ) and 13 for familial hypercholesterolemia ( LDLR/APOB/PCSK9 ). Among 7056 female participants, we detected 15 BRCA1/BRCA2 PV carriers (1 in 470 females). We detected 86 carriers of PVs in lower-penetrance genes associated with inherited cardiac disorders. Conclusion: Medically actionable PVs are carried in a healthy elderly population. Our findings raise questions about the actionability of lower-penetrance genes, especially when PVs are detected in the absence of symptoms and/or family history of disease.
Post-translational modifications (PTMs) play very important roles in various cell signaling pathways and biological process. Due to PTMs’ extremely important roles, many major PTMs have been studied, while the functional and mechanical characterization of major PTMs is well documented in several databases. However, most currently available databases mainly focus on protein sequences, while the real 3D structures of PTMs have been largely ignored. Therefore, studies of PTMs 3D structural signatures have been severely limited by the deficiency of the data. Here, we develop PRISMOID, a novel publicly available and free 3D structure database for a wide range of PTMs. PRISMOID represents an up-to-date and interactive online knowledge base with specific focus on 3D structural contexts of PTMs sites and mutations that occur on PTMs and in the close proximity of PTM sites with functional impact. The first version of PRISMOID encompasses 17 145 non-redundant modification sites on 3919 related protein 3D structure entries pertaining to 37 different types of PTMs. Our entry web page is organized in a comprehensive manner, including detailed PTM annotation on the 3D structure and biological information in terms of mutations affecting PTMs, secondary structure features and per-residue solvent accessibility features of PTM sites, domain context, predicted natively disordered regions and sequence alignments. In addition, high-definition JavaScript packages are employed to enhance information visualization in PRISMOID. PRISMOID equips a variety of interactive and customizable search options and data browsing functions; these capabilities allow users to access data via keyword, ID and advanced options combination search in an efficient and user-friendly way. A download page is also provided to enable users to download the SQL file, computational structural features and PTM sites’ data. We anticipate PRISMOID will swiftly become an invaluable online resource, assisting both biologists and bioinformaticians to conduct experiments and develop applications supporting discovery efforts in the sequence–structural–functional relationship of PTMs and providing important insight into mutations and PTM sites interaction mechanisms. The PRISMOID database is freely accessible at http://prismoid.erc.monash.edu/. The database and web interface are implemented in MySQL, JSP, JavaScript and HTML with all major browsers supported.
Bacteria translocate effector molecules to host cells through highly evolved secretion systems. By definition, the function of these effector proteins is to manipulate host cell biology and the sequence, structural and functional annotations of these effector proteins will provide a better understanding of how bacterial secretion systems promote bacterial survival and virulence. Here we developed a knowledgebase, termed SecretEPDB (Bacterial Secreted Effector Protein DataBase), for effector proteins of type III secretion system (T3SS), type IV secretion system (T4SS) and type VI secretion system (T6SS). SecretEPDB provides enriched annotations of the aforementioned three classes of effector proteins by manually extracting and integrating structural and functional information from currently available databases and the literature. The database is conservative and strictly curated to ensure that every effector protein entry is supported by experimental evidence that demonstrates it is secreted by a T3SS, T4SS or T6SS. The annotations of effector proteins documented in SecretEPDB are provided in terms of protein characteristics, protein function, protein secondary structure, Pfam domains, metabolic pathway and evolutionary details. It is our hope that this integrated knowledgebase will serve as a useful resource for biological investigation and the generation of new hypotheses for research efforts aimed at bacterial secretion systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.