2019
DOI: 10.1093/bioinformatics/btz408
|View full text |Cite
|
Sign up to set email alerts
|

Iterative feature representations improve N4-methylcytosine site prediction

Abstract: Motivation Accurate identification of N4-methylcytosine (4mC) modifications in a genome wide can provide insights into their biological functions and mechanisms. Machine learning recently have become effective approaches for computational identification of 4mC sites in genome. Unfortunately, existing methods cannot achieve satisfactory performance, owing to the lack of effective DNA feature representations that are capable to capture the characteristics of 4mC modifications. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
65
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 112 publications
(65 citation statements)
references
References 28 publications
0
65
0
Order By: Relevance
“…Finally, we get the new C. elegans dataset with 17, 808 samples which contains 111, 73 positive samples and 663, 5 negative samples. The positive samples are the sequences centroided with functional 4mC sites detected by the SMRT sequencing technology, while the negative samples are the sequences with the cytosines in the center but not detected as 4mC (Wei et al, 2019). The new dataset can be downloaded from our github, and the download link is given in section 3.…”
Section: Datasetsmentioning
confidence: 99%
See 4 more Smart Citations
“…Finally, we get the new C. elegans dataset with 17, 808 samples which contains 111, 73 positive samples and 663, 5 negative samples. The positive samples are the sequences centroided with functional 4mC sites detected by the SMRT sequencing technology, while the negative samples are the sequences with the cytosines in the center but not detected as 4mC (Wei et al, 2019). The new dataset can be downloaded from our github, and the download link is given in section 3.…”
Section: Datasetsmentioning
confidence: 99%
“…For performance evaluation, we used the following five generallyused metrics: Sensitivity (SN), Specificity (SP), Accuracy (ACC), Mathew's Correlation Coefficient (MCC) (Wei et al, 2019) and Area Under the ROC Curve (AUC). The definition of each evaluation metric is as follows: where TP indicates that the actual result is a positive sample, and the predicted result is also a positive sample; TN indicates that the actual result is a negative sample, and the predicted result is also a negative sample; FP indicates that the actual result is a negative sample, and the predicted result is a positive sample (indicating that the negative sample is predicted incorrectly); FN indicates that the actual result is a positive sample, and the prediction result is a negative sample (indicating that the positive sample is predicted incorrectly).…”
Section: Performance Evaluationmentioning
confidence: 99%
See 3 more Smart Citations