2023
DOI: 10.3390/biom13050833
|View full text |Cite
|
Sign up to set email alerts
|

Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model

Abstract: In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer new opportunities to attempt such predictions via machine translation by directly relating amino acid sequences and ch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 37 publications
0
2
0
Order By: Relevance
“…In both of these studies, conventional protein–ligand docking scores were used to guide compound prioritization. In a different investigation, a transformer was derived to associate extended sequence motifs of ligand binding sites with active compounds [ 25 ]. In this case, the ability of the model to exactly reproduce ATP site-directed inhibitors of different kinases not included in model training was used as a proof-of-concept criterion (instead of hypothetical scoring).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In both of these studies, conventional protein–ligand docking scores were used to guide compound prioritization. In a different investigation, a transformer was derived to associate extended sequence motifs of ligand binding sites with active compounds [ 25 ]. In this case, the ability of the model to exactly reproduce ATP site-directed inhibitors of different kinases not included in model training was used as a proof-of-concept criterion (instead of hypothetical scoring).…”
Section: Introductionmentioning
confidence: 99%
“…Potential applications of such models include target validation or compound repurposing. Furthermore, in recent studies, transformer-based language models have been employed to learn mappings of protein sequences to compounds [22][23][24][25]. In the following, models using protein sequence data as input are termed protein language models (PLMs), regardless of the nature of the output sequences.…”
Section: Introductionmentioning
confidence: 99%