2021
DOI: 10.1101/2021.11.18.469179
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

evSeq: Cost-Effective Amplicon Sequencing of Every Variant in a Protein Library

Abstract: Widespread availability of protein sequence-fitness data would revolutionize both our biochemical understanding of proteins and our ability to engineer them. Unfortunately, even though thousands of protein variants are generated and evaluated for fitness during a typical protein engineering campaign, most are never sequenced, leaving a wealth of potential sequence-fitness information untapped. This largely stems from the fact that sequencing is unnecessary for many protein engineering strategies; the added cos… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 44 publications
0
4
0
Order By: Relevance
“…3 Wittmann et al have recently published evSeq, a method for cost-efficient sequencing of variable regions within every variant of an engineering campaign based on computational tools and standardized components. 130 evSeq has the potential to drastically increase the amount of available sequence and fitness data, which would normally not be extracted during an enzyme engineering campaign, due to reasons of low time or cost efficiency. • Third, the different accuracies of predictions varying per model and assigned task point out more fundamental connections between data type or construction and inferred rules.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…3 Wittmann et al have recently published evSeq, a method for cost-efficient sequencing of variable regions within every variant of an engineering campaign based on computational tools and standardized components. 130 evSeq has the potential to drastically increase the amount of available sequence and fitness data, which would normally not be extracted during an enzyme engineering campaign, due to reasons of low time or cost efficiency. • Third, the different accuracies of predictions varying per model and assigned task point out more fundamental connections between data type or construction and inferred rules.…”
Section: Discussionmentioning
confidence: 99%
“…Models relying on experimental screening combined with machine learning might be better suited for such tasks. We will highlight a promising approach by Wittmann et al 130 in the following section.…”
Section: Deep-learning Modelsmentioning
confidence: 99%
“…Plasmid DNA was miniprepped (Econospin 96-well filter plate, Epoch Life Science) and verified by Sanger sequencing. Ultrasound-based phenotyping of mutants was performed in BL21-AI (Thermo) as previously described (Hurt et al, n.d.), and all screened mutants were sequenced using the evSeq pipeline (Wittmann et al 2022).…”
Section: Scanning Site Saturation Library Generation and Screeningmentioning
confidence: 99%
“…Labeled protein data consist of a set of amino acid sequences and how each of those sequences map to a particular protein property of interest, such as thermostability, enzyme activity, or binding affinity. These sequence-function data are commonly generated using protein mutagenesis libraries and medium-or high-throughput assays to assign functional labels [5,6]. Supervised learning approaches such as linear regression or more sophisticated non-linear models can learn from labeled sequence-function data to infer the mapping from sequence to function [7][8][9][10].…”
Section: Introductionmentioning
confidence: 99%