2021
DOI: 10.1101/2021.11.02.467003
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimal trade-off control in machine learning-based library design, with application to adeno-associated virus (AAV) for gene therapy

Abstract: AAVs hold tremendous promise as delivery vectors for clinical gene therapy. Yet the ability to design libraries comprising novel and diverse AAV capsids, while retaining the ability of the library to package DNA payloads, has remained challenging. Deep sequencing technologies allow millions of sequences to be assayed in parallel, enabling large-scale probing of fitness landscapes. Such data can be used to train supervised machine learning (ML) models that predict viral properties from sequence, without mechani… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
37
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(39 citation statements)
references
References 51 publications
(108 reference statements)
2
37
0
Order By: Relevance
“…The design problem is a unique setting in which we have control over the data-dependent test input distribution, P X;D , since we choose the procedure used to design an input. In the simplest case, some design procedures sample from a distribution whose form is explicitly chosen, such as an energy-based model whose energy function is proportional to a trained regression model's predictions [10], or whose parameters are set by solving an optimization problem (e.g., to train a generative model) [50,29,12,17,53,70,24,55,74]. In either setting, we know the exact form of the test input distribution, which also absolves the need for density estimation.…”
Section: Algorithm 1 Pseudocode For Approximately Computingmentioning
confidence: 99%
See 4 more Smart Citations
“…The design problem is a unique setting in which we have control over the data-dependent test input distribution, P X;D , since we choose the procedure used to design an input. In the simplest case, some design procedures sample from a distribution whose form is explicitly chosen, such as an energy-based model whose energy function is proportional to a trained regression model's predictions [10], or whose parameters are set by solving an optimization problem (e.g., to train a generative model) [50,29,12,17,53,70,24,55,74]. In either setting, we know the exact form of the test input distribution, which also absolves the need for density estimation.…”
Section: Algorithm 1 Pseudocode For Approximately Computingmentioning
confidence: 99%
“…The training input distribution, P X , is also often explicitly known. In protein design problems, for example, training sequences are often generated by introducing random substitutions to a single wild type sequence [12,10,14], by recombining segments of several "parent" sequences [35,52,9,22], or by independently sampling the amino acid at each position from a known distribution [74,67]. Conveniently, we can then compute the weights in Eq.…”
Section: Algorithm 1 Pseudocode For Approximately Computingmentioning
confidence: 99%
See 3 more Smart Citations