2017
DOI: 10.1101/229708
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TahcoRoll: An Efficient Approach for Signature Profiling in Genomic Data through Variable-Length k-mers

Abstract: Abstract. K -mer profiling has been one of the trending approaches to analyze read data generated by high-throughput sequencing technologies. The tasks of k -mer profiling include, but are not limited to, counting the frequencies and determining the occurrences of short sequences in a dataset. The notion of k -mer has been extensively used to build de Bruijn graphs in genome or transcriptome assembly, which requires examining all possible k -mers presented in the dataset. Recently, an alternative way of profil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 32 publications
0
3
0
Order By: Relevance
“…In order to extract more comprehensive features and achieve superior robustness and predictive performance, we opt for Bi-GRU. Given the intricate local contextual relationships inherent in gene sequences, we segment the complete gene sequences into multiple k-mer fragments [ 20 ]. Then each unique k-mer fragment is embedded and represented by one-hot encoding [ 21 ].…”
Section: Methodsmentioning
confidence: 99%
“…In order to extract more comprehensive features and achieve superior robustness and predictive performance, we opt for Bi-GRU. Given the intricate local contextual relationships inherent in gene sequences, we segment the complete gene sequences into multiple k-mer fragments [ 20 ]. Then each unique k-mer fragment is embedded and represented by one-hot encoding [ 21 ].…”
Section: Methodsmentioning
confidence: 99%
“…To represent different positions in the sequence, we use k -mers as representations because k -mers are capable of preserving more complicated local contexts ( Ju et al , 2017 ). Each unique k -mer is then mapped to a continuous embedding vector as various deep learning approaches in bioinformatics ( Chaabane et al , 2020 ; Min et al , 2017 ).…”
Section: Methodsmentioning
confidence: 99%
“…k-mer Embedding. To represent different positions in the sequence, we use k-mers as representations because k-mers are capable of preserving more complicated local contexts [29]. Each unique k-mer are then mapped to a continuous embedding vector as various deep learning approaches in bioinformatics [6,35].…”
Section: Attentive Junction Encodersmentioning
confidence: 99%