Cell-penetrating peptides (CPPs) have been shown to be a transport vehicle for delivering cargoes into live cells, offering great potential as future therapeutics. It is essential to identify CPPs for better understanding of their functional mechanisms. Machine learning-based methods have recently emerged as a main approach for computational identification of CPPs. However, one of the main challenges and difficulties is to propose an effective feature representation model that sufficiently exploits the inner difference and relevance between CPPs and non-CPPs, in order to improve the predictive performance. In this paper, we have developed CPPred-FL, a powerful bioinformatics tool for fast, accurate and large-scale identification of CPPs. In our predictor, we introduce a new feature representation learning scheme that enables one to learn feature representations from totally 45 well-trained random forest models with multiple feature descriptors from different perspectives, such as compositional information, position-specific information and physicochemical properties, etc. We integrate class and probabilistic information into our feature representations. To improve the feature representation ability, we further remove redundant and irrelevant features by feature space optimization. Benchmarking experiments showed that CPPred-FL, using 19 informative features only, is able to achieve better performance than the state-of-the-art predictors. We anticipate that CPPred-FL will be a powerful tool for large-scale identification of CPPs, facilitating the characterization of their functional mechanisms and accelerating their applications in clinical therapy.
As one of the well-studied RNA methylation modifications, N6-methyladenosine (m6A) plays important roles in various biological progresses, such as RNA splicing and degradation, etc. Identification of m6A sites is fundamentally important for better understanding of their functional mechanisms. Recently, machine learning based prediction methods have emerged as an effective approach for fast and accurate identification of m6A sites. In this paper, we proposed “M6AMRFS”, a new machine learning based predictor for the identification of m6A sites. In this predictor, we exploited a new feature representation algorithm to encode RNA sequences with two feature descriptors (dinucleotide binary encoding and Local position-specific dinucleotide frequency), and used the F-score algorithm combined with SFS (Sequential Forward Search) to enhance the feature representation ability. To predict m6A sites, we employed the eXtreme Gradient Boosting (XGBoost) algorithm to build a predictive model. Benchmarking results showed that the proposed predictor is competitive with the state-of-the art predictors. Importantly, robust predictions for multiple species by our predictor demonstrate that our predictive models have strong generalization ability. To the best of our knowledge, M6AMRFS is the first tool that can be used for the identification of m6A sites in multiple species. To facilitate the use of our predictor, we have established a user-friendly webserver with the implementation of M6AMRFS, which is currently available in http://server.malab.cn/M6AMRFS/. We anticipate that it will be a useful tool for the relevant research of m6A sites.
Motivation Recently, peptides have emerged as a promising class of pharmaceuticals for various diseases treatment poised between traditional small molecule drugs and therapeutic proteins. However, one of the key bottlenecks preventing them from therapeutic peptides is their toxicity toward human cells, and few available algorithms for predicting toxicity are specially designed for short-length peptides. Results We present ToxIBTL, a novel deep learning framework by utilizing the information bottleneck principle and transfer learning to predict the toxicity of peptides as well as proteins. Specifically, we use evolutionary information and physicochemical properties of peptide sequences and integrate the information bottleneck principle into a feature representation learning scheme, by which relevant information is retained and the redundant information is minimized in the obtained features. Moreover, transfer learning is introduced to transfer the common knowledge contained in proteins to peptides, which aims to improve the feature representation capability. Extensive experimental results demonstrate that ToxIBTL not only achieves a higher prediction performance than state-of-the-art methods on the peptide dataset, but also has a competitive performance on the protein dataset. Furthermore, a user-friendly online web server is established as the implementation of the proposed ToxIBTL. Availability The proposed ToxIBTL can be freely accessible at http://server.wei-group.net/ToxIBTL. Supplementary information Supplementary data are available at Bioinformatics online.
Motivation: Peptides have recently emerged as promising therapeutic agents against various diseases. For both research and safety regulation purposes, it is of high importance to develop computational methods to accurately predict the potential toxicity of peptides within the vast number of candidate peptides. Results: In this study, we proposed ATSE, a peptide toxicity predictor by exploiting structural and evolutionary information based on graph neural networks and attention mechanism. More specifically, it consists of four modules: (i) a sequence processing module for converting peptide sequences to molecular graphs and evolutionary profiles, (ii) a feature extraction module designed to learn discriminative features from graph structural information and evolutionary information, (iii) an attention module employed to optimize the features and (iv) an output module determining a peptide as toxic or non-toxic, using optimized features from the attention module. Conclusion: Comparative studies demonstrate that the proposed ATSE significantly outperforms all other competing methods. We found that structural information is complementary to the evolutionary information, effectively improving the predictive performance. Importantly, the data-driven features learned by ATSE can be interpreted and visualized, providing additional information for further analysis. Moreover, we present a user-friendly online computational platform that implements the proposed ATSE, which is now available at http://server.malab.cn/ATSE. We expect that it can be a powerful and useful tool for researchers of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.