2023
DOI: 10.1038/s41467-023-36443-x
|View full text |Cite
|
Sign up to set email alerts
|

Direct generation of protein conformational ensembles via machine learning

Abstract: Dynamics and conformational sampling are essential for linking protein structure to biological function. While challenging to probe experimentally, computer simulations are widely used to describe protein dynamics, but at significant computational costs that continue to limit the systems that can be studied. Here, we demonstrate that machine learning can be trained with simulation data to directly generate physically realistic conformational ensembles of proteins without the need for any sampling and at neglig… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
75
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 64 publications
(76 citation statements)
references
References 49 publications
0
75
0
1
Order By: Relevance
“…Alternative methods based on the translation of the distance map predicted by AlphaFold into structural ensembles using the distances as structural restraints in allatom molecular dynamics simulations 19,23,24 could be equally time consuming. As shown here, faster results can be obtained by generating initial ensembles using methods based on coarse graining 45,46 or deep learning 21,22 , and then relying on reweighting procedures to obtain more accurate statistical weights.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Alternative methods based on the translation of the distance map predicted by AlphaFold into structural ensembles using the distances as structural restraints in allatom molecular dynamics simulations 19,23,24 could be equally time consuming. As shown here, faster results can be obtained by generating initial ensembles using methods based on coarse graining 45,46 or deep learning 21,22 , and then relying on reweighting procedures to obtain more accurate statistical weights.…”
Section: Discussionmentioning
confidence: 99%
“…, 𝑤 1 2 }. 𝑃 2 (𝑿) can be generated with physics based methods, such as molecular dynamics 27,30 , or with machine learning methods such as generative models 21,22 . However, various approximations in the generation procedure, for example due to inaccurate force field in molecular dynamics or to low-quality structural data in the training of neural networks, imply that 𝑃 2 (𝑿) might differ from the true distribution of structures 𝑃(𝑿).…”
Section: Reweightingmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the computational cost for CTD models substantially increased for the 44-residue CTD as we needed to run 16 replicates to cover the most probable conformational spaces. Therefore, generating the Alternative strategies can be applied for studying full length CTDs, which include coarse-grained MD simulations, 16,[59][60][61] fragmentation of long chains 13,14 or generative machine learning models [19][20][21] using variational auto-encoder or attention-based approaches. A future direction for our study is to generate conformations of longer CTD models with different phosphorylation patterns using coarse-grained simulations and then develop a machine learning model based on coarse-grained conformations to predict conformational ensembles of CTD sequences in any given length and phosphorylation pattern.…”
Section: Discussionmentioning
confidence: 99%
“…5,[15][16][17][18] Generative machine learning models were also developed, which utilized conformations obtained from MD simulations for training. [19][20][21] Among computational methods, MD simulations with enhanced sampling techniques appeared to be a powerful method especially for relatively short sequences to provide valuable insights in atomic level interactions of IDPs in order to obtain an accurate conformational space. 15,18,[22][23][24][25] The C-terminal domain (CTD) of RNA polymerase II (Pol II) is a low complexity domain that contains heptapeptide (YSPTSPS) repeating sequence with the number of repeating units differs .…”
Section: Introductionmentioning
confidence: 99%