2021
DOI: 10.1101/2021.10.28.466307
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimal Design of Stochastic DNA Synthesis Protocols based on Generative Sequence Models

Abstract: Generative probabilistic models of biological sequences have widespread existing and potential applications in analyzing, predicting and designing proteins, RNA and genomes. To test the predictions of such a model experimentally, the standard approach is to draw samples, and then synthesize each sample individually in the laboratory. However, often orders of magnitude more sequences can be experimentally assayed than can affordably be synthesized individually. In this article, we propose instead to use stochas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(10 citation statements)
references
References 43 publications
0
10
0
Order By: Relevance
“…We have focused our experiments on these constrained libraries because they are currently more cost-effective, and thus most widely used. Indeed, Weinstein et al [34] showed that for a fixed cost, the use of a constrained library construction can yield orders of magnitude more promising leads in protein engineering than an unconstrained (individual synthesis) approach. As the cost of individual synthesis declines, it will become increasingly useful to use our design approach to specify unconstrained libraries that are both diverse and fit.…”
Section: Resultsmentioning
confidence: 99%
“…We have focused our experiments on these constrained libraries because they are currently more cost-effective, and thus most widely used. Indeed, Weinstein et al [34] showed that for a fixed cost, the use of a constrained library construction can yield orders of magnitude more promising leads in protein engineering than an unconstrained (individual synthesis) approach. As the cost of individual synthesis declines, it will become increasingly useful to use our design approach to specify unconstrained libraries that are both diverse and fit.…”
Section: Resultsmentioning
confidence: 99%
“…The efficient manifold hypothesis has direct, practical applications for those trying to evolve proteins in the laboratory. Evolution guided by a language model can be used as a drop-in replacement for current evolutionary tools based on randomization; for example, combinatorial libraries [50], [51] can recombine language-model-guided mutations alongside or instead of rationally chosen mutations [33]. By leveraging increasingly efficient technologies for nucleic acid printing [42], language-model-guided evolution could also directly replace mutagenesis strategies based on, for example, an error-prone polymerase.…”
Section: Discussionmentioning
confidence: 99%
“…Since then, graph representations of sequences have become widespread as descriptive tools in bioinformatics, used to reconstruct naturally occurring biological sequences. In modern molecular biology and bioengineering, where the design of synthetic biological systems is fundamentally intertwined with the characterization of natural biological systems, there is growing interest in sequence representations amenable to design tasks (16, 17). However, outside of highly specialized applications (18, 19), graph representations of sequences are far less commonly used in design contexts.…”
Section: Discussionmentioning
confidence: 99%
“…We have implemented 197 the SeqWalk algorithm and additional filtering tools in a pip 198 distributed Python package (seqwalk , source code avail-199 able at github.com/storyetfall/seqwalk, docu-200 mented at seqwalk.readthedocs.io). Additionally, 201 In modern molecular biology and bioengineering, where the design of synthetic biological systems is fundamentally intertwined with the characterization of natural biological systems, there is growing interest in sequence representations amenable to design tasks (16,17). However, outside of highly specialized applications (18,19), graph representations of sequences are far less commonly used in design contexts.…”
mentioning
confidence: 99%