HydrogelFinder: A Foundation Model for Efficient Self‐Assembling Peptide Discovery Guided by Non‐Peptidal Small Molecules

Ren, Xuanbai; Wei, Jiaying; Luo, Xiaoli; Liu, Yuansheng; Li, Kenli; Zhang, Qiang; Gao, Xin; Yan, Sizhe; Wu, Xia; Jiang, Xingyue; Liu, Mingquan; Cao, Dongsheng; Wei, Leyi; Zeng, Xiangxiang; Shi, Junfeng

doi:10.1002/advs.202400829

Advanced Science

2024

DOI: 10.1002/advs.202400829

|View full text |Cite

HydrogelFinder: A Foundation Model for Efficient Self‐Assembling Peptide Discovery Guided by Non‐Peptidal Small Molecules

Xuanbai Ren,

Jiaying Wei,

Xiaoli Luo

et al.

Abstract: Self‐assembling peptides have numerous applications in medicine, food chemistry, and nanotechnology. However, their discovery has traditionally been serendipitous rather than driven by rational design. Here, HydrogelFinder, a foundation model is developed for the rational design of self‐assembling peptides from scratch. This model explores the self‐assembly properties by molecular structure, leveraging 1,377 self‐assembling non‐peptidal small molecules to navigate chemical space and improve structural diversit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Article4

Relationship

Self Cite0

Independent4

Authors

Journals

Cited by 4 publications

References 76 publications

(116 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

Zuo,

Zhang,

Dong

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

Glycation, a type of posttranslational modification, preferentially occurs on lysine and arginine residues, impairing protein functionality and altering characteristics. This process is linked to diseases such as Alzheimer’s, diabetes, and atherosclerosis. Traditional wet lab experiments are time-consuming, whereas machine learning has significantly streamlined the prediction of protein glycation sites. Despite promising results, challenges remain, including data imbalance, feature redundancy, and suboptimal classifier performance. This research introduces Glypred, a lysine glycation site prediction model combining ClusterCentroids Undersampling (CCU), LightGBM, and bidirectional long short-term memory network (BiLSTM) methodologies, with an additional multihead attention mechanism integrated into the BiLSTM. To achieve this, the study undertakes several key steps: selecting diverse feature types to capture comprehensive protein information, employing a cluster-based undersampling strategy to balance the data set, using LightGBM for feature selection to enhance model performance, and implementing a bidirectional LSTM network for accurate classification. Together, these approaches ensure that Glypred effectively identifies glycation sites with high accuracy and robustness. For feature encoding, five distinct feature typesAAC, KMER, DR, PWAA, and EBGWwere selected to capture a broad spectrum of protein sequence and biological information. These encoded features were integrated and validated to ensure comprehensive protein information acquisition. To address the issue of highly imbalanced positive and negative samples, various undersampling algorithms, including random undersampling, NearMiss, edited nearest neighbor rule, and CCU, were evaluated. CCU was ultimately chosen to remove redundant nonglycated training data, establishing a balanced data set that enhances the model’s accuracy and robustness. For feature selection, the LightGBM ensemble learning algorithm was employed to reduce feature dimensionality by identifying the most significant features. This approach accelerates model training, enhances generalization capabilities, and ensures good transferability of the model. Finally, a bidirectional long short-term memory network was used as the classifier, with a network structure designed to capture glycation modification site features from both forward and backward directions. To prevent overfitting, appropriate regularization parameters and dropout rates were introduced, achieving efficient classification. Experimental results show that Glypred achieved optimal performance. This model provides new insights for bioinformatics and encourages the application of similar strategies in other fields. A lysine glycation site prediction software tool was also developed using the PyQt5 library, offering researchers an auxiliary screening tool to reduce workload and improve efficiency. The software and data sets are available on GitHub: .

show abstract

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

Zuo,

Zhang,

Dong

et al. 2024

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features

Li,

Jiang,

Yang

et al. 2024

IJMS

View full text Add to dashboard Cite

Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.

show abstract

Research on Bitter Peptides in the Field of Bioinformatics: A Comprehensive Review

Liu,

Shi,

et al. 2024

IJMS

View full text Add to dashboard Cite

Bitter peptides are small molecular peptides produced by the hydrolysis of proteins under acidic, alkaline, or enzymatic conditions. These peptides can enhance food flavor and offer various health benefits, with attributes such as antihypertensive, antidiabetic, antioxidant, antibacterial, and immune-regulating properties. They show significant potential in the development of functional foods and the prevention and treatment of diseases. This review introduces the diverse sources of bitter peptides and discusses the mechanisms of bitterness generation and their physiological functions in the taste system. Additionally, it emphasizes the application of bioinformatics in bitter peptide research, including the establishment and improvement of bitter peptide databases, the use of quantitative structure–activity relationship (QSAR) models to predict bitterness thresholds, and the latest advancements in classification prediction models built using machine learning and deep learning algorithms for bitter peptide identification. Future research directions include enhancing databases, diversifying models, and applying generative models to advance bitter peptide research towards deepening and discovering more practical applications.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

HydrogelFinder: A Foundation Model for Efficient Self‐Assembling Peptide Discovery Guided by Non‐Peptidal Small Molecules

Cited by 4 publications

References 76 publications

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

Glypred: Lysine Glycation Site Prediction via CCU–LightGBM–BiLSTM Framework with Multi-Head Attention Mechanism

iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features

Research on Bitter Peptides in the Field of Bioinformatics: A Comprehensive Review

Contact Info

Product

Resources

About