2023
DOI: 10.1021/acs.jcim.3c00601
|View full text |Cite
|
Sign up to set email alerts
|

Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability

Abstract: Gaussian process (GP) is a Bayesian model which provides several advantages for regression tasks in machine learning such as reliable quantitation of uncertainty and improved interpretability. Their adoption has been precluded by their excessive computational cost and by the difficulty in adapting them for analyzing sequences (e.g., amino acid sequences) and graphs (e.g., small molecules). In this study, we introduce a group of random feature-approximated kernels for sequences and graphs that exhibit linear sc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 53 publications
2
3
0
Order By: Relevance
“…The good results achieved with a simple model are reminiscent of recent studies where linear models or approximate Gaussian processes achieved performance competitive with fine-tuned LLMs or other deep learning models on fitness landscapes( Faure et al 2023 , Parkinson and Wang 2023 ). These results suggest it is important to use simple models as a baseline for protein engineering tasks.…”
Section: Discussionsupporting
confidence: 57%
“…The good results achieved with a simple model are reminiscent of recent studies where linear models or approximate Gaussian processes achieved performance competitive with fine-tuned LLMs or other deep learning models on fitness landscapes( Faure et al 2023 , Parkinson and Wang 2023 ). These results suggest it is important to use simple models as a baseline for protein engineering tasks.…”
Section: Discussionsupporting
confidence: 57%
“…The good results achieved with a simple model are reminiscent of recent studies where linear models or approximate Gaussian processes achieved performance competitive with fine-tuned LLMs or other deep learning models on fitness landscapes 23,24 . These results suggest it is important to use simple models as a baseline for protein engineering tasks.…”
Section: Discussionsupporting
confidence: 58%
“…The generated content can encompass a wide range of topics, such as organic chemistry, inorganic chemistry, analytical chemistry, physical chemistry, biochemistry, and other related areas. Several papers have already been published on chemistry and ChatGPT, such as drug discovery, teaching learning, computational chemistry, etc. ChatGPT can be used to provide quick and accessible information about various aspects of chemistry and it may be a valuable tool for researchers, students, and professionals . Besides, ChatGPT can explain chemical concepts in an easier language, help students understand complex topics better, and potentially aid in problem-solving .…”
Section: Introductionmentioning
confidence: 99%