2024
DOI: 10.1101/2024.03.14.584508
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A new framework for evaluating model out-of-distribution for the biochemical domain

Raúl Fernández-Díaz,
Thanh Lam Hoang,
Vanessa Lopez
et al.

Abstract: We have developed Hestia, a computational tool that provides a unified framework for introducing similarity correction techniques across different biochemical data types. We propose a new strategy for dividing a dataset into training and evaluation subsets (CCPart) and have compared it against other methods at different thresholds to explore the impact that these choices have on model generalisation evaluation, through the lens of overfitting diagnosis. We have trained molecular language models for protein seq… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
references
References 46 publications
0
0
0
Order By: Relevance