Gene sequences in the vicinity of splice sites are found to possess dinucleotide periodicities, especially RR and YY, with the period close to the pitch of nucleosome DNA. This confirms previously reported finding about preferential positioning of splice junctions within the nucleosomes. The RR and YY dinucleotides oscillate counterphase, i.e., their respective preferred positions are shifted about half-period one from another, as it was observed earlier for AA and TT dinucleotides. Species specificity of nucleosome positioning DNA pattern is indicated by predominant use of the periodical GG(CC) dinucleotides in human and mouse genes, as opposed to predominant AA(TT) dinucleotides in Arabidopsis and C.elegans.
Positional distributions of various dinucleotides in experimentally derived human nucleosome DNA sequences are analyzed. Nucleosome positioning in this species is found to depend largely on GG and CC dinucleotides periodically distributed along the nucleosome DNA sequence, with the period of 10.4 bases. The GG and CC dinucleotides oscillate counterphase, i.e., their respective preferred positions are shifted about a half-period from one another, as it was observed earlier for AA and TT dinucleotides. Other purine-purine and pyrimidine-pyrimidine dinucleotides (RR and YY) display the same periodical and counterphase pattern. The dominance of oscillating GG and CC dinucleotides in human nucleosomes and the contribution of AG(CT), GA(TC), and AA(TT) suggest a general nucleosome DNA sequence pattern - counterphase oscillation of RR and YY dinucleotides. AA and TT dinucleotides, commonly accepted as major players, are only weak contributors in the case of human nucleosomes.
Alu sequences carry periodical pattern with CG dinucleotides (CpG) repeating every 31-32 bases. Similar distances are observed in distribution of DNA curvature in crystallized nucleosomes, at positions +/-1.5 and +/-4.5 periods of DNA from nucleosome DNA dyad. Since CG elements are also found to impart to nucleosomes higher stability when positioned at +/-1.5 sites, it suggests that CG dinucleotides may play a role in modulation of the nucleosome strength when the CG elements are methylated. Thus, Alu sequences may harbor special epigenetic nucleosomes with methylation-dependent regulatory functions. Nucleosome DNA sequence probe is suggested to detect locations of such regulatory nucleosomes in the sequences.
A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The "divide-and-conquer"-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses' results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long-even moderately up-regulated zones-at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for crosscomparison of signals across the same genome in evolutionary and general genomic studies.genome segmentation | tiling array | next-generation sequencing
A large portion of the usual eukaryotic genome is comprised of repetitive sequences. A common situation, when several related but different repeat families share the same conserved motif, complicates repeat classification and repeat boundary definition. If the repeats are aligned by the motif position, then the sequence profile (pattern) resulting from the alignment will represent overlapping of the profiles (patterns) corresponding to the individual families. A novel algorithm for the decomposition of overlapping patterns is proposed. It can be used with both continuous and gapped patterns. The technique is based on accumulation of simultaneously occurring pattern features found by cross-correlation procedure with limited lag length; thus, the name is Cumulative Local Cross-Correlation (referred further as CLCC). Its sensitivity is tested on human genomic sequences. Software implementation of the algorithm is available on request from the author.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.