2013
DOI: 10.1093/nar/gkt144
|View full text |Cite
|
Sign up to set email alerts
|

Comparative analysis using K-mer and K-flank patterns provides evidence for CpG island sequence evolution in mammalian genomes

Abstract: CpG islands are GC-rich regions often located in the 5′ end of genes and normally protected from cytosine methylation in mammals. The important role of CpG islands in gene transcription strongly suggests evolutionary conservation in the mammalian genome. However, as CpG dinucleotides are over-represented in CpG islands, comparative CpG island analysis using conventional sequence analysis techniques remains a major challenge in the epigenetics field. In this study, we conducted a comparative analysis of all CpG… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
17
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 24 publications
(17 citation statements)
references
References 27 publications
0
17
0
Order By: Relevance
“…Since a unique read is assigned a unique quadruplet, the deduplexing can be done efficiently. Nubeam can be effective in some areas where the K-mer approach is useful, for example, characterization of protein binding sequence motif (Newburger and Bulyk, 2009), characterizing CpG island by the flanking regions (Chae et al, 2013), and characterizing sequence feature for haplotype grouping (Navarro-Gomez et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…Since a unique read is assigned a unique quadruplet, the deduplexing can be done efficiently. Nubeam can be effective in some areas where the K-mer approach is useful, for example, characterization of protein binding sequence motif (Newburger and Bulyk, 2009), characterizing CpG island by the flanking regions (Chae et al, 2013), and characterizing sequence feature for haplotype grouping (Navarro-Gomez et al, 2015).…”
Section: Discussionmentioning
confidence: 99%
“…In this paper, we propose a novel task of variable-length k-mer profiling. While the necessity of diversifying k-mer lengths has already been demonstrated in many studies [6,16,29], most of the existing works only support fixed-length k-mers and need an enormous amount of memory, disk space, and time to profile k-mers with a wide range of k's. By leveraging the techniques of binarization and rolling hash for Aho-Corasick automaton, we construct a thinned Aho-Corasick automaton accelerated by rolling hash (TahcoRoll) to profile variable-length k-mers in genomic data.…”
Section: Discussionmentioning
confidence: 99%
“…The best k to characterize different genomic regions can vary. Chae et al [6] have shown that it is necessary to consider patterns of 3-to 10-mers to construct the phylogenetic tree. Rahman et al [29] have proposed to merge the differential occurred k -mers to form longer and variable-length sequences for downstream analysis.…”
mentioning
confidence: 99%
“…This work was motivated by our previous works in modeling DNA methylation susceptibility [26–28] and conservation of CpG island sequences [29]. We and many scientists believe that DNA methylation is not random and probably there is an instructive mechanisms embedded in the genomic sequences [30].…”
Section: Motivationmentioning
confidence: 99%