2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2019
DOI: 10.1109/bibm47256.2019.8983072
|View full text |Cite
|
Sign up to set email alerts
|

Unaligned Sequence Similarity Search Using Deep Learning

Abstract: Gene annotation has traditionally required direct comparison of DNA sequences between an unknown gene and a database of known ones using string comparison methods. However, these methods do not provide useful information when a gene does not have a close match in the database. In addition, each comparison can be costly when the database is large since it requires alignments and a series of string comparisons. In this work we propose a novel approach: using recurrent neural networks to embed DNA or amino-acid s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 46 publications
0
1
0
Order By: Relevance
“…These CNN models outperform variants of BLAST [2] models, which achieve ∼1.14-1.654% error rate. Senter et al [18] considers the same high-level question as us of using protein embeddings for a vector similarity-based search. They train neural network models for classifying proteins from the RefSeq [13] database and find that such a training procedure, which is analogous to our lens training, produces embeddings that generalize to and separate unseen classes of proteins, allowing for effective nearest-neighbors classification.…”
Section: Related Workmentioning
confidence: 99%
“…These CNN models outperform variants of BLAST [2] models, which achieve ∼1.14-1.654% error rate. Senter et al [18] considers the same high-level question as us of using protein embeddings for a vector similarity-based search. They train neural network models for classifying proteins from the RefSeq [13] database and find that such a training procedure, which is analogous to our lens training, produces embeddings that generalize to and separate unseen classes of proteins, allowing for effective nearest-neighbors classification.…”
Section: Related Workmentioning
confidence: 99%