2022
DOI: 10.1101/2022.10.28.514293
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tuned Fitness Landscapes for Benchmarking Model-Guided Protein Design

Abstract: Advancements in DNA synthesis and sequencing technologies have enabled a novel paradigm of protein design where machine learning (ML) models trained on experimental data are used to guide exploration of a protein fitness landscape. ML-guided directed evolution (MLDE) builds on the success of traditional directed evolution and unlocks strategies which make more efficient use of experimental data. Building an MLDE pipeline involves many design choices across the design-build-test-learn loop ranging from data col… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 70 publications
1
6
0
Order By: Relevance
“…Our previous work demonstrated a method for inferring ruggedness using an FCN model trained to take in mutational spread data from a single starting location [42]. Similar landscape inference techniques have been performed using sequence alignment data, which assuming sufficient data exists online, also do not require sequencing during a directed evolution experiment [43].…”
Section: Discussionmentioning
confidence: 99%
“…Our previous work demonstrated a method for inferring ruggedness using an FCN model trained to take in mutational spread data from a single starting location [42]. Similar landscape inference techniques have been performed using sequence alignment data, which assuming sufficient data exists online, also do not require sequencing during a directed evolution experiment [43].…”
Section: Discussionmentioning
confidence: 99%
“…Interacting residues near the active site of enzymes are likely to have more epistatic combinations of mutations, and the effects of mutations at these sites may be harder to predict . Similarly, studies should also explore how fitness landscapes are similar or different between different types of proteins, i.e., binding proteins, enzymes, and synthetic landscapes developed using evolutionary priors . Ultimately, combinatorial mutagenesis data sets on additional protein families are necessary for understanding when MLDE is useful.…”
Section: Navigating Protein Fitness Landscapes Using Machine Learningmentioning
confidence: 99%
“…Ultimately, combinatorial mutagenesis data sets on additional protein families are necessary for understanding when MLDE is useful. In addition to developing high-throughput assays to map protein sequences to fitnesses, it will be important to develop general and realistic mathematical models to describe protein fitness landscapes (Figure A). , …”
Section: Navigating Protein Fitness Landscapes Using Machine Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Proteins can be engineered to improve them for applications ranging from chemical manufacturing to diagnostics and therapeutics. Directed evolution (DE) is a powerful protein engineering method that optimizes protein fitness by greedy hill climbing in amino acid sequence space. Recently, machine learning (ML) has emerged as a useful tool to complement DE and accelerate protein engineering, as has been done with channelrhodopsins, , adeno-associated viruses, , enzymes, , and other proteins. ,, In ML-assisted protein engineering (MLPE), ML models are trained on data to learn a mapping between protein sequences and their associated fitness values to approximate protein fitness landscapes. These trained models can then predict the fitness of previously unseen protein variants, increasing screening efficiency by evaluating proteins in silico and expanding exploration to a greater scope of sequences, compared to conventional DE approaches. For instance, in machine learning-assisted directed evolution (MLDE), an ML model is trained on a small sample of variants in a multisite simultaneous mutagenesis (combinatorial) library and then used to predict fitness and rank all variants within the combinatorial space.…”
Section: Introductionmentioning
confidence: 99%