Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties

Zhu, Huan; Ao, Chunyan; Ding, Yongsheng; Hao, Hongxia; Yu, Liang

doi:10.3390/ijms23063044

Cited by 7 publications

(7 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This section summarizes 11 remaining encoders namely, nucleic acid composition (NAC) 62 , enhanced nucleic acid composition (ENAC) 63 , accumulated nucleotide frequency (ANF) 64 , dinucleotide composition (DNC) 65 , trinucleotide composition (TNC) 66 , nucleotide chemical property (NCP) 67 , binary 68 , electron ionic interaction potential (EIIP) 57 , series correlation pseudo dinucleotide composition (SCPseDNC), 69 , pseudo dinucleotide composition (PSEDNC) 70 , 71 , and pseudo k-tupler composition (PSEKNC) 72 .…”

Section: Methodsmentioning

confidence: 99%

“…Similarly, dinucleotide composition (DNC) 65 and trinucleotide composition (TNC) 66 , use the pairs of nucleotides (k = 2, or k = 3) to compute normalized occurrence frequencies rather than taking into account individual nucleotides. Enhanced nucleic acid composition (ENAC) 63 transforms raw sequences into statistical vectors by counting the number of different k-mers at a fixed sliding window. First, a dictionary of unique k-mers is created and then for each unique each k-mer, within each window its count is computed.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns

Abbasi,

Asim,

Ahmed

et al. 2024

Sci Rep

View full text Add to dashboard Cite

Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction. To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns

Abbasi,

Asim,

Ahmed

et al. 2024

Sci Rep

View full text Add to dashboard Cite

show abstract

“…Similarly, dinucleotide composition (DNC) 46 and trinucleotide composition (TNC) 47 , use the pairs of nucleotides (k=2, or k=3) to compute normalized occurrence frequencies rather than taking into account individual nucleotides. Enhanced nucleic acid composition (ENAC) 44 transforms raw sequences into statistical vectors by counting the number of different k-mers at a fixed sliding window. First, a dictionary of unique k-mers is created and then for each unique each k-mer, within each window its count is computed.…”

Section: Methodsmentioning

confidence: 99%

iLEC-DNA: Identifying Long Extra-chromosomal Circular DNA by Fusing Sequence-derived Features of Physicochemical Properties and Nucleotide Distribution Patterns

Abbasi,

Asim,

Dengel

et al. 2023

Preprint

View full text Add to dashboard Cite

Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely HM, AT, and YS. It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor and more than 140 baseline predictors. The proposed predictor outperforms baseline predictors across diverse leccDNA datasets by producing average performance values of 80.699%, 61.45% and 80.7% in terms of ACC, MCC and AUC-ROC across all the datasets.

show abstract

“… 25 A number of computational methods have been developed for predicting epigenetic modifications of RNA. 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 Among them, iRNAD is the first approach for D-site prediction from multiple species, which used a support vector machine to distinguish D and non-D sites. 34 Later, iRNAD_XGBoost used XGBoost-selected multiple features to construct a model for D detection.…”

Section: Introductionmentioning

confidence: 99%

“… 34 Later, iRNAD_XGBoost used XGBoost-selected multiple features to construct a model for D detection. 33 However, to the best of our knowledge, all existing D-site prediction tools 32 , 33 , 34 , 35 were trained on tRNAs, and it is not clear whether they can be applied to predict D sites on mRNAs. Although recent studies have unveiled the widely occurring nature and transcriptome-wide distribution of D (or the D epitranscriptome), 14 there are still no prediction tools constructed for mRNA D sites using mRNA D datasets.…”

Section: Introductionmentioning

confidence: 99%

Self-attention enabled deep learning of dihydrouridine (D) modification on mRNAs unveiled a distinct sequence signature from tRNAs

Wang¹,

Wang²,

Cui³

et al. 2023

Molecular Therapy - Nucleic Acids

View full text Add to dashboard Cite

Identification of D Modification Sites Using a Random Forest Model Based on Nucleotide Chemical Properties

Cited by 7 publications

References 58 publications

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns

Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns

iLEC-DNA: Identifying Long Extra-chromosomal Circular DNA by Fusing Sequence-derived Features of Physicochemical Properties and Nucleotide Distribution Patterns

Self-attention enabled deep learning of dihydrouridine (D) modification on mRNAs unveiled a distinct sequence signature from tRNAs

Contact Info

Product

Resources

About