2020
DOI: 10.1016/j.csbj.2019.11.004
|View full text |Cite
|
Sign up to set email alerts
|

Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions

Abstract: Graphical abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 40 publications
(16 citation statements)
references
References 35 publications
(41 reference statements)
0
16
0
Order By: Relevance
“…Deep learning shows excellent ability with large-scale data support in many fields, however, ncRPIs data sets generally don't have large scales, thus it's not very suitable or urgent need for deep learning methods. Previous research confirmed that in ncRPIs prediction task, tree-based model and SVM model can work well, and sequences contain enough information for predicting ncRPIs [25,26]. Traditional machine learning techniques have the potential to be explored for accuracy and interpretability in small sample learning tasks, especially ncRNA-protein interactions prediction task.…”
Section: Introductionmentioning
confidence: 87%
“…Deep learning shows excellent ability with large-scale data support in many fields, however, ncRPIs data sets generally don't have large scales, thus it's not very suitable or urgent need for deep learning methods. Previous research confirmed that in ncRPIs prediction task, tree-based model and SVM model can work well, and sequences contain enough information for predicting ncRPIs [25,26]. Traditional machine learning techniques have the potential to be explored for accuracy and interpretability in small sample learning tasks, especially ncRNA-protein interactions prediction task.…”
Section: Introductionmentioning
confidence: 87%
“…This kind of model is a shallow two-layer neural network. In recent bioinformatics studies, some methods [22,23] have been used to train word embedding models for DNA, proteins, and lncRNAs, and it has been proved that this method is superior to the traditional processing sequence embedding methods such as one-hot and Kmers.…”
Section: Distribution Representation Of Mirna and Mrna Sequencesmentioning
confidence: 99%
“…One of the most successful word embedding-based models is the word2vec model (Mikolov, Chen, Corrado, & Dean, 2013) for generating distributed representations of words and phrases. Considerable advances have been made with its standard application (Asgari & Mofrad, 2015a), with the functionality being extended to modelling for DNA (Ng, 2017), RNA (Yi et al, 2020) and protein (Asgari & Mofrad, 2015b) sequences. To briefly summarize those studies, the impact of projecting sequence data on embedded spaces is likely to reduce the complexity of the algorithms needed to solve certain tasks (e.g.…”
Section: Continuous Distributed Representations For Protein Sequencesmentioning
confidence: 99%
“…• each protein sequence is treated as a sentence, made by overlapping words (k-mers) to incorporate some context-order information in the resulting distributed representation; • the word size is 3, which seems to work properly to embed amino acid sequences for biological tasks (S. Cheng et al, 2019, Yi et al (2020); • the sequence vector is defined as the arithmetic mean of all its word vectors.…”
Section: Continuous Distributed Representations For Protein Sequencesmentioning
confidence: 99%