TahcoRoll: An Efficient Approach for Signature Profiling in Genomic Data through Variable-Length <i>k</i>-mers

Ju, Chelsea J.‐T.; Jiang, Jyun-Yu; Li, Ruirui; Li, Zeyu; Wang, Wei

doi:10.1101/229708

Cited by 3 publications

(3 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to extract more comprehensive features and achieve superior robustness and predictive performance, we opt for Bi-GRU. Given the intricate local contextual relationships inherent in gene sequences, we segment the complete gene sequences into multiple k-mer fragments [ 20 ]. Then each unique k-mer fragment is embedded and represented by one-hot encoding [ 21 ].…”

Section: Methodsmentioning

confidence: 99%

Inference of gene regulatory networks based on directed graph convolutional networks

Wei,

Guo,

Gao

et al. 2024

Briefings in Bioinformatics

View full text Add to dashboard Cite

Inferring gene regulatory network (GRN) is one of the important challenges in systems biology, and many outstanding computational methods have been proposed; however there remains some challenges especially in real datasets. In this study, we propose Directed Graph Convolutional neural network-based method for GRN inference (DGCGRN). To better understand and process the directed graph structure data of GRN, a directed graph convolutional neural network is conducted which retains the structural information of the directed graph while also making full use of neighbor node features. The local augmentation strategy is adopted in graph neural network to solve the problem of poor prediction accuracy caused by a large number of low-degree nodes in GRN. In addition, for real data such as E.coli, sequence features are obtained by extracting hidden features using Bi-GRU and calculating the statistical physicochemical characteristics of gene sequence. At the training stage, a dynamic update strategy is used to convert the obtained edge prediction scores into edge weights to guide the subsequent training process of the model. The results on synthetic benchmark datasets and real datasets show that the prediction performance of DGCGRN is significantly better than existing models. Furthermore, the case studies on bladder uroepithelial carcinoma and lung cancer cells also illustrate the performance of the proposed model.

show abstract

Section: Methodsmentioning

confidence: 99%

Inference of gene regulatory networks based on directed graph convolutional networks

Wei,

Guo,

Gao

et al. 2024

Briefings in Bioinformatics

View full text Add to dashboard Cite

show abstract

“…To represent different positions in the sequence, we use k -mers as representations because k -mers are capable of preserving more complicated local contexts ( Ju et al , 2017 ). Each unique k -mer is then mapped to a continuous embedding vector as various deep learning approaches in bioinformatics ( Chaabane et al , 2020 ; Min et al , 2017 ).…”

Section: Methodsmentioning

confidence: 99%

JEDI: circular RNA prediction based on junction encoders and deep interaction among splice sites

Jiang

Hao

et al. 2021

Bioinformatics

Self Cite

View full text Add to dashboard Cite

Motivation Circular RNA (circRNA) is a novel class of long non-coding RNAs that have been broadly discovered in the eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backspliced to an upstream acceptor site. These circRNA sequences are conserved across species. More importantly, rising evidence suggests their vital roles in gene regulation and association with diseases. As the fundamental effort toward elucidating their functions and mechanisms, several computational methods have been proposed to predict the circular structure from the primary sequence. Recently, advanced computational methods leverage deep learning to capture the relevant patterns from RNA sequences and model their interactions to facilitate the prediction. However, these methods fail to fully explore positional information of splice junctions and their deep interaction. Results We present a robust end-to-end framework, Junction Encoder with Deep Interaction (JEDI), for circRNA prediction using only nucleotide sequences. JEDI first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI can not only predict circRNAs but also interpret relationships among splice sites to discover backsplicing hotspots within a gene region. Experiments demonstrate JEDI significantly outperforms state-of-the-art approaches in circRNA prediction on both isoform level and gene level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve. Availability and implementation The implementation of our framework is available at https://github.com/hallogameboy/JEDI. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

“…k-mer Embedding. To represent different positions in the sequence, we use k-mers as representations because k-mers are capable of preserving more complicated local contexts [29]. Each unique k-mer are then mapped to a continuous embedding vector as various deep learning approaches in bioinformatics [6,35].…”

Section: Attentive Junction Encodersmentioning

confidence: 99%

JEDI: Circular RNA Prediction based on Junction Encoders and Deep Interaction among Splice Sites

Jiang

Hao

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Circular RNA is a novel class of endogenous non-coding RNAs that have been largely discovered in eukaryotic transcriptome. The circular structure arises from a non-canonical splicing process, where the donor site backsplices to an upstream acceptor site. These circular form of RNAs are conserved across species, and often show tissue or cell-specific expression. Emerging evidences have suggested its vital roles in gene regulation, which are further associated with various types of diseases. As the fundamental effort to elucidate its function and mechanism, numerous efforts have been devoted to predicting circular RNA from its primary sequence. However, statistical learning methods are constrained by the information presented with explicit features, and the existing deep learning approach falls short on fully exploring the positional information of the splice sites and their deep interaction. We present an effective and robust end-to-end framework, JEDI, for circular RNA prediction using only the nucleotide sequence. Our framework first leverages the attention mechanism to encode each junction site based on deep bidirectional recurrent neural networks and then presents the novel cross-attention layer to model deep interaction among these sites for backsplicing. Finally, JEDI is capable of not only addressing the task of circular RNA prediction but also interpreting the relationships among splice sites to discover the hotspots for backsplicing within a gene region. Experimental evaluations demonstrate that JEDI significantly outperforms several state-of-the-art approaches in circular RNA prediction on both isoform-level and gene-level. Moreover, JEDI also shows promising results on zero-shot backsplicing discovery, where none of the existing approaches can achieve. The implementation of our framework is available at https://github.com/hallogameboy/ JEDI.

show abstract

TahcoRoll: An Efficient Approach for Signature Profiling in Genomic Data through Variable-Length k-mers

Cited by 3 publications

References 32 publications

Inference of gene regulatory networks based on directed graph convolutional networks

Inference of gene regulatory networks based on directed graph convolutional networks

JEDI: circular RNA prediction based on junction encoders and deep interaction among splice sites

JEDI: Circular RNA Prediction based on Junction Encoders and Deep Interaction among Splice Sites

Contact Info

Product

Resources

About