2018
DOI: 10.1093/bioinformatics/bty287
|View full text |Cite
|
Sign up to set email alerts
|

A novel methodology on distributed representations of proteins using their interacting ligands

Abstract: MotivationThe effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand-based approach can be utilized in protein representation. In this study, we propose SMILESVec, a Simplified molecular input line entry system (SMILES)-based method to represent ligands and … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
26
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
9
1

Relationship

3
7

Authors

Journals

citations
Cited by 38 publications
(26 citation statements)
references
References 45 publications
0
26
0
Order By: Relevance
“…LSTM architecture has been successfully employed to tasks such as detecting homology ( Hochreiter et al ., 2007 ), constructive peptide design ( Muller et al ., 2018 ) and function prediction ( Liu, 2017 ) that utilize amino-acid sequences. As future work, we also aim to utilize a recent ligand-based protein representation method proposed by our team that uses SMILES sequences of the interacting ligands to describe proteins ( Öztürk et al ., 2018 ).…”
Section: Resultsmentioning
confidence: 99%
“…LSTM architecture has been successfully employed to tasks such as detecting homology ( Hochreiter et al ., 2007 ), constructive peptide design ( Muller et al ., 2018 ) and function prediction ( Liu, 2017 ) that utilize amino-acid sequences. As future work, we also aim to utilize a recent ligand-based protein representation method proposed by our team that uses SMILES sequences of the interacting ligands to describe proteins ( Öztürk et al ., 2018 ).…”
Section: Resultsmentioning
confidence: 99%
“…Over the past decade, some researchers have successfully applied NLP techniques into biological sequences. One of the pioneering studies is from Asgari and Mofrad (2015) and it had been applied successfully in many later bioinformatics applications (Habibi et al, 2017; Hamid and Friedberg, 2018; Öztürk et al, 2018). However, most studies used the Word2Vec model or FastText model with a single level of N-gram.…”
Section: Introductionmentioning
confidence: 99%
“…A large database of ligands was essential to test their binding affinity with the S-protein. For that, we used the data available [ 30 ] from GitHub repository [ 31 ]. We have used the data available there and converted that into a CSV file containing the SMILES code of over 615,000 distinct ligands.…”
Section: Resultsmentioning
confidence: 99%