2023
DOI: 10.1101/2023.08.10.552783
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A conditional protein diffusion model generates artificial programmable endonuclease sequences with enhanced activity

Bingxin Zhou,
Lirong Zheng,
Banghao Wu
et al.

Abstract: Computation or deep learning-based functional protein generation methods address the urgent demand for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This emergence leads to the creation of highly efficient and specialized proteins with wide-ranging applications in scientific, technological, and biomedical domains. This study establishes a conditional protein diffusion model, namely CPDiffusion, to deliver diverse protein sequences with desired functions. W… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 41 publications
0
2
0
Order By: Relevance
“…As an alternative, researchers often opt to pretrain models to encode protein representations from their sequences and/or structures. The learned embeddings can be utilized for de novo protein design, variant effects prediction, higher-level structure prediction, , and so on. Analogous to coaching a novice in a new field, a model’s learning efficiency is maximized by initially exposing it to a substantial data set of WT, allowing it to grasp the broader context before delving into specific proteins and their functionalities.…”
Section: Introductionmentioning
confidence: 99%
“…As an alternative, researchers often opt to pretrain models to encode protein representations from their sequences and/or structures. The learned embeddings can be utilized for de novo protein design, variant effects prediction, higher-level structure prediction, , and so on. Analogous to coaching a novice in a new field, a model’s learning efficiency is maximized by initially exposing it to a substantial data set of WT, allowing it to grasp the broader context before delving into specific proteins and their functionalities.…”
Section: Introductionmentioning
confidence: 99%
“…As an alternative, researchers often opt to pre-train models to encode protein representations from their sequences and/or structures. The learned embeddings can be utilized for de novo protein design [11][12][13][14][15], variant effects prediction [16][17][18][19], higher-level structure prediction [20,21], etc. Analogous to coaching a novice in a new field, a model's learning efficiency is maximized by initially exposing it to a substantial dataset of WT, allowing it to grasp the broader context before delving into specific proteins and their functionalities.…”
Section: Introductionmentioning
confidence: 99%
“…Deep learning approaches have been instrumental in advancing scientific insights into proteins, predominatly categorieed into two primary domains: sequence-based [30], [34], [60] and structure-based [33], [80], [81] methods. Analogous to the analysis of semantics and syntax in natural language, protein language models (PLM) interpret protein sequences as raw text and employ autoregressive inference techniques [34], [43] with self-attention mechanisms [3], [41], [51].…”
Section: Introductionmentioning
confidence: 99%