A new framework for evaluating model out-of-distribution for the biochemical domain

Fernández-Díaz, Raúl; Hoang, Thanh Lam; Lopez, Vanessa; Shields, Denis C.

doi:10.1101/2024.03.14.584508

2024

DOI: 10.1101/2024.03.14.584508

|View full text |Cite

Preprint

A new framework for evaluating model out-of-distribution for the biochemical domain

Raúl Fernández-Díaz,

Thanh Lam Hoang,

Vanessa Lopez

et al.

Abstract: We have developed Hestia, a computational tool that provides a unified framework for introducing similarity correction techniques across different biochemical data types. We propose a new strategy for dividing a dataset into training and evaluation subsets (CCPart) and have compared it against other methods at different thresholds to explore the impact that these choices have on model generalisation evaluation, through the lens of overfitting diagnosis. We have trained molecular language models for protein seq… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Publication Types

Select...

Preprint1

Relationship

Self Cite1

Independent0

Authors

Journals

Cited by 1 publication

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

AutoPeptideML: A study on how to build more trustworthy peptide bioactivity predictors

Fernandez-Diaz,

Cossio-Pérez,

Agoni

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build trustworthy models. Here, we consider the design of such a tool for developing peptide bioactivity predictors. We analyse different design choices concerning data acquisition and negative class definition, homology partitioning for the construction of independent evaluation sets, the use of protein language models as a general sequence representation method, and model selection and hyperparameter optimisation. Finally, we integrate the conclusions drawn from this study into AutoPeptideML, an end-to-end, user-friendly application that enables experimental researchers to build trustworthy models, facilitating compliance with community guidelines.The source code, documentation, and data are available in the project GitHub repository:https://github.com/IBM/AutoPeptideML. Additionally, we have established a dedicated web-server, accessible at:http://peptide.ucd.ie/AutoPeptideML.

show abstract

AutoPeptideML: A study on how to build more trustworthy peptide bioactivity predictors

Fernandez-Diaz,

Cossio-Pérez,

Agoni

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

A new framework for evaluating model out-of-distribution for the biochemical domain

Cited by 1 publication

References 46 publications

AutoPeptideML: A study on how to build more trustworthy peptide bioactivity predictors

AutoPeptideML: A study on how to build more trustworthy peptide bioactivity predictors

Contact Info

Product

Resources

About