2019
DOI: 10.1101/757252
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Protein Design and Variant Prediction Using Autoregressive Generative Models

Abstract: A major biomedical challenge is the interpretation of genetic variation and the ability to design functional novel sequences. Since the space of all possible genetic variation is enormous, there is a concerted effort to develop reliable methods that can capture genotype to phenotype maps. State-of-art computational methods rely on models that leverage evolutionary information and capture complex interactions between residues. However, current methods are not suitable for a large number of important application… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
51
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 44 publications
(51 citation statements)
references
References 99 publications
(103 reference statements)
0
51
0
Order By: Relevance
“…The aforementioned work, along with O' Connell et al (2018), Boomsma & Frellsen (2017), and Greener et al (2018), all utilize explicit structural information for generative modeling, thereby are unable to fully capture the number and diversity of sequenceonly data available. Meanwhile sequence-only generative modeling have been attempted recently through residual causal dilated convolutional neural networks (Riesselman et al, 2019) and variational autoencoders (Costello & Martin, 2019). Unlike these prior works, our work on generative modeling focuses on a high-capacity language models that scale well with sequence data and can be used for controllable generation.…”
Section: Related Workmentioning
confidence: 99%
“…The aforementioned work, along with O' Connell et al (2018), Boomsma & Frellsen (2017), and Greener et al (2018), all utilize explicit structural information for generative modeling, thereby are unable to fully capture the number and diversity of sequenceonly data available. Meanwhile sequence-only generative modeling have been attempted recently through residual causal dilated convolutional neural networks (Riesselman et al, 2019) and variational autoencoders (Costello & Martin, 2019). Unlike these prior works, our work on generative modeling focuses on a high-capacity language models that scale well with sequence data and can be used for controllable generation.…”
Section: Related Workmentioning
confidence: 99%
“…Next, we compared ECNet to other sequence modeling approaches for mutation effects prediction on a larger set of DMS datasets previously curated in [25] . We first compared to three unsupervised methods, including EVmutation [24] , DeepSequence [25] , and Autoregressive [48] . These methods trained generative models on homologous sequences and predicted the mutation effects by calculating the log-ratio of sequence probabilities of mutant and wild type sequences.…”
Section: Accurate Prediction Of Functional Fitness Landscape Of Proteinsmentioning
confidence: 99%
“…Generative models of protein sequences such as EVmutation and DeepSequence are dependent on the alignment of homologous sequences, which may introduce artifacts and lose important information caused by indels in the alignment. A generative autoregressive model was proposed by Riesselman et al [48] to predict the mutation effects in protein sequence, without the requirement of multiple sequence alignment.…”
Section: Yang Et Al (Doc2vec)mentioning
confidence: 99%
See 1 more Smart Citation
“…Deep learning offers one route to better capture the complex relationships between sequence and protein behavior and has been the focus of many recent publications. [38][39][40][41][42] Within the context of discovery and libraries, the generative models such as Generative Adversarial Networks (GANs) 43,44 and autoencoder networks (AEs) 45 are of particular interest as they have been shown to be viable for generating unique sequences of proteins 46,47 and nanobodies 48 and antibody CDRs 49 . But these efforts focus on short sequences of proteins or portions of antibodies.…”
Section: Introductionmentioning
confidence: 99%