2020
DOI: 10.1039/c9cp06554g
|View full text |Cite
|
Sign up to set email alerts
|

A review of mathematical representations of biomolecular data

Abstract: Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include struct… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
76
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
8
1

Relationship

2
7

Authors

Journals

citations
Cited by 70 publications
(81 citation statements)
references
References 163 publications
(353 reference statements)
0
76
0
Order By: Relevance
“…This is a prevalent test set to assert the scoring ability of a binding affinity prediction model and has attracted lots of research groups to devote the effort to improve the Pearson's correlation coefficient (R p ) and Kendall's tau (s) on this core set performance. 18,42,43 In the current work, we merge the PDBbind v2019, SARS-CoV PDB-BA, and SARS-CoV 2D sets but removing the duplicates and excluding the PDBbind v2016 core set complexes to attain a training set of 17 211 complexes. MathDL with the architecture described in Section 3.2.1 is trained on those complexes.…”
Section: Validationsmentioning
confidence: 99%
“…This is a prevalent test set to assert the scoring ability of a binding affinity prediction model and has attracted lots of research groups to devote the effort to improve the Pearson's correlation coefficient (R p ) and Kendall's tau (s) on this core set performance. 18,42,43 In the current work, we merge the PDBbind v2019, SARS-CoV PDB-BA, and SARS-CoV 2D sets but removing the duplicates and excluding the PDBbind v2016 core set complexes to attain a training set of 17 211 complexes. MathDL with the architecture described in Section 3.2.1 is trained on those complexes.…”
Section: Validationsmentioning
confidence: 99%
“…It consists of 168 integer values that describe the complex of ligand-receptor (compound-protein) by considering the set of eight types of interaction for a pair of AA and an atom. In addition, Nguyen et al [148] reviewed in detail how biomolecular data of high complexity and dimensionality are converted to features using mathematical methods.…”
Section: Discussionmentioning
confidence: 99%
“…The representation of sequence and structure data for machine-learning has enabled advances in protein design and structure prediction. 34,35 This study showed that backbone geometries of a small number of amino acids (three or four) involved in metal binding can be compressed to an efficient, order-independent, mathematical representation that captures the three dimensional geometry of the metal-binding amino acids with no loss of prediction accuracy compared with the full set of orderdependent internal coordinates. The compressed features can be fed into a simple and fast machine learning model that can distinguish between different kinds of metal cofactors with an overall accuracy of 95%, and the probabilities that are generated by the machine-learning model are useful for further discernment between FPs and TPs.…”
Section: Discussionmentioning
confidence: 99%