Repetitive DNA motifs are abundant in the genomes of various species and have the capacity to adopt non-canonical (i.e. non-B) DNA structures. Several non-B DNA structures, including cruciforms, slipped structures, triplexes, G-quadruplexes, and Z-DNA, have been shown to cause mutations, such as deletions, expansions, and translocations in both prokaryotes and eukaryotes. Their distributions in genomes are not random and often co-localize with sites of chromosomal breakage associated with genetic diseases. Current genome-wide sequence analyses suggest that the genomic instabilities induced by non-B DNA structure-forming sequences not only result in predisposition to disease, but also contribute to rapid evolutionary changes, particularly in genes associated with development and regulatory functions. In this review, we describe the occurrence of non-B DNA-forming sequences in various species, the classes of genes enriched in non-B DNA-forming sequences, and recent mechanistic studies on DNA structure-induced genomic instability to highlight their importance in genomes.
The history of investigations on non-B DNA conformations as related to genetic diseases dates back to the mid-1960s. Studies with high molecular weight DNA polymers of defined repeating nucleotide sequences demonstrated the role of sequence in their properties and conformations (1). Investigations with repeating homo-, di-, tri-, and tetranucleotide repeating motifs revealed the powerful role of sequence in molecular behaviors. At that time, this concept was heretical because numerous prior investigations with naturally occurring DNA sequences masked the effect of sequence (1). It may be noted that these studies in the 1960s predated DNA sequencing by at least a decade.Early studies were followed by a number of innovative discoveries on DNA conformational features in synthetic oligomers, restriction fragments, and recombinant DNAs. The DNA polymorphisms were a function of sequence, topology (supercoil density), ionic conditions, protein binding, methylation, carcinogen binding, and other factors (2). A number of non-B DNA structures have been discovered (approximately one new conformation every 3 years for the past 35 years) and include the following: triplexes, left-handed DNA, bent DNA, cruciforms, nodule DNA, flexible and writhed DNA, G4 tetrad (tetraplexes), slipped structures, and sticky DNA (Fig. 1). From the outset, it was realized (1, 2) that these sequence effects probably have profound biological implications, and indeed their role in transcription (3) and in the maintenance of telomere ends (4) has recently been reviewed.However, in the past few years dramatic advances from genomics, human genetics, medicine, and DNA structural biology have revealed the role of non-B conformations in the etiology of at least 46 human genetic diseases ( Table I) that involve genomic rearrangements as well as other types of mutation events. Non-B DNA ConformationsSegments of DNA are polymorphic. A large number of simple DNA repeat sequences can exist in at least two conformations. All of these sequences adopt the orthodox right-handed B form, probably for the majority of the time, with Watson-Crick (WC) 1 A⅐T and G⅐C bp. However, at least 10 non-B conformations (5-7) are formed, perhaps transiently, at specific sequence motifs as a function of negative supercoil density, generated in part by transcription, protein binding, and other factors.Non-B DNA structures (Fig.
A method is described to express and purify human DNA (cytosine-5) methyltransferase (human DNMT1) using a protein splicing (intein) fusion partner in a baculovirus expression vector. The system produces ϳ1 mg of intact recombinant enzyme >95% pure per 1.5 ؋ 10 9 insect cells. The protein lacks any affinity tag and is identical to the native enzyme except for the two Cterminal amino acids, proline and glycine, that were substituted for lysine and aspartic acid for optimal cleavage from the intein affinity tag. Human DNMT1 was used for steady-state kinetic analysis with poly(dIdC)⅐poly(dI-dC) and unmethylated and hemimethylated 36-and 75-mer oligonucleotides. The turnover number (k cat ) was 131-237 h ؊1 on poly(dI-dC)⅐poly(dI-dC), 1.2-2.3 h ؊1 on unmethylated DNA, and 8.3-49 h ؊1 on hemimethylated DNA. The Michaelis constants for DNA (K m CG ) and S-adenosyl-L-methionine (AdoMet) (K m AdoMet ) ranged from 0.33-1.32 and 2.6 -7.2 M, respectively, whereas the ratio of k cat /K m CG ranged from 3.9 to 44 (237-336 for poly(dI-dC)⅐poly(dI-dC)) ؋ 10 6 M ؊1 h ؊1 . The preference of the enzyme for hemimethylated, over unmethylated, DNA was 7-21-fold. The values of k cat on hemimethylated DNAs showed a 2-3-fold difference, depending upon which strand was pre-methylated. Furthermore, human DNMT1 formed covalent complexes with substrates containing 5-fluoro-CNG, indicating that substrate specificity extended beyond the canonical CG dinucleotide. These results show that, in addition to maintenance methylation, human DNMT1 may also carry out de novo and non-CG methyltransferase activities in vivo.Methylated cytosine is found in the genome of organisms ranging from prokaryotes to mammals (1). Methylation of DNA in eukaryotes is implicated in various biological and developmental processes, such as gene regulation (2), DNA replication (3), genomic imprinting (4), embryonic development (5), carcinogenesis (6), and genetic diseases (7). The bulk of the methylation takes place during DNA replication in the S-phase of the cell cycle (8). The maintenance methylation ensures the propagation of tissue-specific methylation patterns established during mammalian development. The methyl transfer reaction proceeds via nonspecific binding of the enzyme to DNA, recognition of the specific DNA target site, and recruitment of the methyl group donor S-adenosyl-L-methionine (AdoMet) 1 to the active site of the enzyme. DNA (cytosine-5) methyltransferases (m 5 C MTase) introduce a methyl group onto carbon 5 of the target cytosine through a covalent intermediate between the protein and the target cytosine (9). During this process, the cytosine is flipped 180°out of the DNA backbone into an active site pocket of the enzyme (10). After completion of the methyl transfer reaction, the products, methylated DNA and S-adenosyl-L-homocysteine (AdoHcy), are released. Previous studies on the mechanism of methylation were mainly limited to prokaryotic m 5 C MTases although some limited kinetic studies have also been reported with mouse and human DNMT1 (11,12...
Genomic rearrangements are a frequent source of instability, but the mechanisms involved are poorly understood. A 2.5-kbp poly(purine⅐pyrimidine) sequence from the human PKD1 gene, known to form non-B DNA structures, induced long deletions and other instabilities in plasmids that were mediated by mismatch repair and, in some cases, transcription. The breakpoints occurred at predicted non-B DNA structures. Distance measurements also indicated a significant proximity of alternating purine-pyrimidine and oligo(purine⅐pyrimidine) tracts to breakpoint junctions in 222 gross deletions and translocations, respectively, involved in human diseases. In 11 deletions analyzed, breakpoints were explicable by non-B DNA structure formation. We conclude that alternative DNA conformations trigger genomic rearrangements through recombination-repair activities. G ross chromosomal rearrangements are a common source of genetic instability (1). Thus, characterization of the underlying molecular mechanisms of mutagenesis is fundamental for our understanding of human disease. A hallmark of gross deletions is the presence of short homologous tracts (typically 2-8 bp) at the breakpoints (2), a finding that has prompted speculation as to the two distinct mechanisms postulated to be responsible for their formation. The slipped mispairing model (3) envisages that during lagging strand DNA synthesis, distantly located repeats are brought into close proximity by the looping out of the single-stranded region, thereby enabling the replication complex to ''jump'' from the proximal to the distal repeat and hence bypass the looped structure. Alternative models propose that various types of repetitive sequence elements may serve as substrates for intra-or intermolecular recombination (2, 4). Neither model is satisfactory; slipped mispairing is inconsistent with deletions greater than Ϸ500 bp and deletions manifesting Ͻ4-bp homologies (5-9), whereas the recombination model does not provide a rationale for the initiation of the process.Specific sequence motifs such as direct and inverted repeats, (RY⅐RY) n and (R⅐Y) n , in which R represents purine and Y represents pyrimidine, and four closely spaced G-rich direct repeats [i.e., (G⅐C) 3 ] undergo structural transitions from the orthodox right-handed B-helical duplex to higher energy state non-B structures (slipped hairpin͞loops, cruciforms, left-handed Z-helices, triplexes, and tetraplexes, respectively) (10-12) under torsional stress (negative supercoiling) in vivo.Early articles in bacteria and hamster cells reported isolated cases in which deletions could occur by a recombination-repair reaction mediated by cruciform structures forming at each breakpoint (13,14). Recently, the breakpoint junctions of the human constitutional translocations t(1;22), t(4;22), t(11;22), and t(17;22), which involve a common locus on chromosome 22q11.2, were found to coincide with large (Ͼ95 bp) cruciform structures (15-18), suggesting that this conformation may predispose specific loci to genomic rearrangements.The po...
Microsatellites are abundant in vertebrate genomes, but their sequence representation and length distributions vary greatly within each family of repeats (e.g., tetranucleotides). Biophysical studies of 82 synthetic single-stranded oligonucleotides comprising all tetra-and trinucleotide repeats revealed an inverse correlation between the stability of folded-back hairpin and quadruplex structures and the sequence representation for repeats Ն30 bp in length in nine vertebrate genomes. Alternatively, the predicted energies of base-stacking interactions correlated directly with the longest length distributions in vertebrate genomes. Genome-wide analyses indicated that unstable sequences, such as CAG:CTG and CCG:CGG, were over-represented in coding regions and that micro/minisatellites were recruited in genes involved in transcription and signaling pathways, particularly in the nervous system. Microsatellite instability (MSI) is a hallmark of cancer, and length polymorphism within genes can confer susceptibility to inherited disease. Sequences that manifest the highest MSI values also displayed the strongest base-stacking interactions; analyses of 62 tri-and tetranucleotide repeat-containing genes associated with human genetic disease revealed enrichments similar to those noted for micro/minisatellite-containing genes. We conclude that DNA structure and base-stacking determined the number and length distributions of microsatellite repeats in vertebrate genomes over evolutionary time and that micro/minisatellites have been recruited to participate in both gene and protein function.
Gross chromosomal rearrangements (including translocations, deletions, insertions and duplications) are a hallmark of cancer genomes and often create oncogenic fusion genes. An obligate step in the generation of such gross rearrangements is the formation of DNA double-strand breaks (DSBs). Since the genomic distribution of rearrangement breakpoints is non-random, intrinsic cellular factors may predispose certain genomic regions to breakage. Notably, certain DNA sequences with the potential to fold into secondary structures [potential non-B DNA structures (PONDS); e.g. triplexes, quadruplexes, hairpin/cruciforms, Z-DNA and single-stranded looped-out structures with implications in DNA replication and transcription] can stimulate the formation of DNA DSBs. Here, we tested the postulate that these DNA sequences might be found at, or in close proximity to, rearrangement breakpoints. By analyzing the distribution of PONDS-forming sequences within ±500 bases of 19 947 translocation and 46 365 sequence-characterized deletion breakpoints in cancer genomes, we find significant association between PONDS-forming repeats and cancer breakpoints. Specifically, (AT)n, (GAA)n and (GAAA)n constitute the most frequent repeats at translocation breakpoints, whereas A-tracts occur preferentially at deletion breakpoints. Translocation breakpoints near PONDS-forming repeats also recur in different individuals and patient tumor samples. Hence, PONDS-forming sequences represent an intrinsic risk factor for genomic rearrangements in cancer genomes.
Repetitive DNA motifs may fold into non-B DNA structures, including cruciforms/hairpins, triplexes, slipped conformations, quadruplexes, and left-handed Z-DNA, thereby representing chromosomal targets for DNA repair, recombination, and aberrant DNA synthesis leading to repeat expansion or genomic rearrangements associated with neurodegenerative and genomic disorders. Hairpins and quadruplexes also determined the relative abundances of simple sequence repeats (SSR) in vertebrate genomes, whereas strong base stacking has permitted the expansion of purine.pyrimidine-rich SSR during evolutionary time. SSR are enriched in regulatory and cancer-related gene classes, where they have been actively recruited to participate in both gene and protein functions. SSR polymorphic alleles in the population are associated with cancer susceptibility, including within genes that appear to share regulatory circuits involving reactive oxygen species.
The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.