2023
DOI: 10.1101/2023.01.04.522704
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MuLan-Methyl - Multiple Transformer-based Language Models for Accurate DNA Methylation Prediction

Abstract: Transformer-based language models are successfully used to address massive text-related tasks. DNA methylation is an important epigenetic mechanism and its analysis provides valuable insights into gene regulation and biomarker identification. Several deep learning-based methods have been proposed to identify DNA methylation and each seeks to strike a balance between computational effort and accuracy. Here, we introduce a MuLan-Methyl, a deep learning framework for predicting DNA methylation sites, which is bas… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(11 citation statements)
references
References 61 publications
(70 reference statements)
0
6
0
Order By: Relevance
“…Accuracy 23 refers to the proportion of correct predictions with respect to the total predictions. Specificity or true negative rate (TNR) 20 is the model’s ability to correctly predict the negative class samples.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Accuracy 23 refers to the proportion of correct predictions with respect to the total predictions. Specificity or true negative rate (TNR) 20 is the model’s ability to correctly predict the negative class samples.…”
Section: Methodsmentioning
confidence: 99%
“…It is determined by dividing the number of correct negative predictions by the total number of true negatives. Sensitivity (or recall) 23 measures the ability of the model to predict positive class samples by taking the ratio of correct positive predictions to the predictions on positive samples. MCC 78 calculates the correlation between the model predictions and the true class, by taking into consideration true positives, true negatives, false positives, and false negatives.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…It is determined by dividing the number of correct negative predictions by the total number of true negatives. Sensitivity (or recall) 23 In the mathematical expression above, T + and T − denote the true predictions related to positive and negative classes, whereas F + and F − are the incorrect predictions related to the positive and negative classes respectively.…”
Section: Evaluation Measuresmentioning
confidence: 99%
“…For example, self-supervised tasks such as masked language modelling (MLM) have recently been used to pretrain genomic sequence embeddings that are then fine-tuned for downstream tasks (e.g. Ji et al (2021); Mo et al (2021); Benegas et al (2022); Zeng et al (2023)). Pretraining using task-relevant data can improve the performance of fine-tuned models (Gururangan et al, 2020), while pretraining using irrelevant data can hurt performance (Liu et al, 2022).…”
Section: Introductionmentioning
confidence: 99%