2023
DOI: 10.1101/2023.02.11.528149
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep learning-based codon optimization with large-scale synonymous variant datasets enables generalized tunable protein expression

Abstract: Increasing recombinant protein expression is of broad interest in industrial biotechnology, synthetic biology, and basic research. Codon optimization is an important step in heterologous gene expression that can have dramatic effects on protein expression level. Several codon optimization strategies have been developed to enhance expression, but these are largely based on bulk usage of highly frequent codons in the host genome, and can produce unreliable results. Here, we develop deep contextual language model… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 68 publications
(89 reference statements)
0
12
0
Order By: Relevance
“…Such understanding may only come from codon optimizing many different types of genes in many different hosts, a project that would be economically infeasible with current DNA synthesis costs. In addition, it is possible that there are even better methods for codon optimizing genes; indeed, recent developments in artificial intelligence and machine learning could play a crucial role in optimizing heterologous gene sequences in the future …”
Section: Discussionmentioning
confidence: 99%
“…Such understanding may only come from codon optimizing many different types of genes in many different hosts, a project that would be economically infeasible with current DNA synthesis costs. In addition, it is possible that there are even better methods for codon optimizing genes; indeed, recent developments in artificial intelligence and machine learning could play a crucial role in optimizing heterologous gene sequences in the future …”
Section: Discussionmentioning
confidence: 99%
“…Notably, the BPE algorithm has demonstrated efficacy in routinely identifying prevalent linear patterns in vast data sets. Furthermore, the field continues to expand at an unprecedented pace, with recent developments including the use of generative models for constructing antibody libraries . Machine learning and NLP algorithms are proving to be pivotal in analyzing protein data sets, facilitating a deeper understanding of protein structure, function, and interactions.…”
Section: Discussionmentioning
confidence: 99%
“…The fact that the 64-letter codon alphabet serves to encode richer information than the 20-letter amino acid alphabet can be directly exploited by ML models for improving performance on a wide range of tasks that are now being tackled at the protein sequence level. 273 While there have been several studies, e.g., tackling protein expression optimization, 274 melting temperatures, subcellular localization, solubility or function, 273 we believe further research in this area might provide a strong boost for predicting many essential protein characteristics but will require rethinking of the existing data sets at the nucleotide level.…”
Section: Acs Catalysis Pubsacsorg/acscatalysismentioning
confidence: 99%