DNA cytosine methylation is crucial for retrotransposon silencing and mammalian development. In a computational search for enzymes that could modify 5-methylcytosine (5mC), we identified TET proteins as mammalian homologs of the trypanosome proteins JBP1 and JBP2, which have been proposed to oxidize the 5-methyl group of thymine. We show here that TET1, a fusion partner of the MLL gene in acute myeloid leukemia, is a 2-oxoglutarate (2OG)- and Fe(II)-dependent enzyme that catalyzes conversion of 5mC to 5-hydroxymethylcytosine (hmC) in cultured cells and in vitro. hmC is present in the genome of mouse embryonic stem cells, and hmC levels decrease upon RNA interference–mediated depletion of TET1. Thus, TET proteins have potential roles in epigenetic regulation through modification of 5mC to hmC.
BackgroundProteinaceous toxins are observed across all levels of inter-organismal and intra-genomic conflicts. These include recently discovered prokaryotic polymorphic toxin systems implicated in intra-specific conflicts. They are characterized by a remarkable diversity of C-terminal toxin domains generated by recombination with standalone toxin-coding cassettes. Prior analysis revealed a striking diversity of nuclease and deaminase domains among the toxin modules. We systematically investigated polymorphic toxin systems using comparative genomics, sequence and structure analysis.ResultsPolymorphic toxin systems are distributed across all major bacterial lineages and are delivered by at least eight distinct secretory systems. In addition to type-II, these include type-V, VI, VII (ESX), and the poorly characterized “Photorhabdus virulence cassettes (PVC)”, PrsW-dependent and MuF phage-capsid-like systems. We present evidence that trafficking of these toxins is often accompanied by autoproteolytic processing catalyzed by HINT, ZU5, PrsW, caspase-like, papain-like, and a novel metallopeptidase associated with the PVC system. We identified over 150 distinct toxin domains in these systems. These span an extraordinary catalytic spectrum to include 23 distinct clades of peptidases, numerous previously unrecognized versions of nucleases and deaminases, ADP-ribosyltransferases, ADP ribosyl cyclases, RelA/SpoT-like nucleotidyltransferases, glycosyltranferases and other enzymes predicted to modify lipids and carbohydrates, and a pore-forming toxin domain. Several of these toxin domains are shared with host-directed effectors of pathogenic bacteria. Over 90 families of immunity proteins might neutralize anywhere between a single to at least 27 distinct types of toxin domains. In some organisms multiple tandem immunity genes or immunity protein domains are organized into polyimmunity loci or polyimmunity proteins. Gene-neighborhood-analysis of polymorphic toxin systems predicts the presence of novel trafficking-related components, and also the organizational logic that allows toxin diversification through recombination. Domain architecture and protein-length analysis revealed that these toxins might be deployed as secreted factors, through directed injection, or via inter-cellular contact facilitated by filamentous structures formed by RHS/YD, filamentous hemagglutinin and other repeats. Phyletic pattern and life-style analysis indicate that polymorphic toxins and polyimmunity loci participate in cooperative behavior and facultative ‘cheating’ in several ecosystems such as the human oral cavity and soil. Multiple domains from these systems have also been repeatedly transferred to eukaryotes and their viruses, such as the nucleo-cytoplasmic large DNA viruses.ConclusionsAlong with a comprehensive inventory of toxins and immunity proteins, we present several testable predictions regarding active sites and catalytic mechanisms of toxins, their processing and trafficking and their role in intra-specific and inter-specific interac...
The apicomplexan Cryptosporidium parvum is an intestinal parasite that affects healthy humans and animals, and causes an unrelenting infection in immunocompromised individuals such as AIDS patients. We report the complete genome sequence of C. parvum, type II isolate. Genome analysis identifies extremely streamlined metabolic pathways and a reliance on the host for nutrients. In contrast to Plasmodium and Toxoplasma, the parasite lacks an apicoplast and its genome, and possesses a degenerate mitochondrion that has lost its genome. Several novel classes of cell-surface and secreted proteins with a potential role in host interactions and pathogenesis were also detected. Elucidation of the core metabolism, including enzymes with high similarities to bacterial and plant counterparts, opens new avenues for drug development.
A previous comparative-genomic study of large nuclear and cytoplasmic DNA viruses (NCLDVs) of eukaryotes revealed the monophyletic origin of four viral families: poxviruses, asfarviruses, iridoviruses, and phycodnaviruses [Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75 (23), 11720-11734]. Here we update this analysis by including the recently sequenced giant genome of the mimiviruses and several additional genomes of iridoviruses, phycodnaviruses, and poxviruses. The parsimonious reconstruction of the gene complement of the ancestral NCLDV shows that it was a complex virus with at least 41 genes that encoded the replication machinery, up to four RNA polymerase subunits, at least three transcription factors, capping and polyadenylation enzymes, the DNA packaging apparatus, and structural components of an icosahedral capsid and the viral membrane. The phylogeny of the NCLDVs is reconstructed by cladistic analysis of the viral gene complements, and it is shown that the two principal lineages of NCLDVs are comprised of poxviruses grouped with asfarviruses and iridoviruses grouped with phycodnaviruses-mimiviruses. The phycodna-mimivirus grouping was strongly supported by several derived shared characters, which seemed to rule out the previously suggested basal position of the mimivirus [Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306 (5700), 1344-1350]. These results indicate that the divergence of the major NCLDV families occurred at an early stage of evolution, prior to the divergence of the major eukaryotic lineages. It is shown that subsequent evolution of the NCLDV genomes involved lineage-specific expansion of paralogous gene families and acquisition of numerous genes via horizontal gene transfer from the eukaryotic hosts, other viruses, and bacteria (primarily, endosymbionts and parasites). Amongst the expansions, there are multiple families of predicted virus-specific signaling and regulatory domains. Most NCLDVs have also acquired large arrays of genes related to ubiquitin signaling, and the animal viruses in particular have independently evolved several defenses against apoptosis and immune response, including growth factors and potential inhibitors of cytokine signaling. The mimivirus displays an enormous array of genes of bacterial provenance, including a representative of a new class of predicted papain-like peptidases. It is further demonstrated that a significant number of genes found in NCLDVs also have homologs in bacteriophages, although a vertical relationship between the NCLDVs and a particular bacteriophage group could not be established. On the basis of these observations, two alternative scenarios for the origin of the NCLDVs and other groups of large DNA viruses of eukaryotes are considered. One of these scenarios posits an early assembly of an already large DNA virus precursor from whi...
Comparative analysis of the protein sequences encoded in the genomes of three families of large DNA viruses that replicate, completely or partly, in the cytoplasm of eukaryotic cells (poxviruses, asfarviruses, and iridoviruses) and phycodnaviruses that replicate in the nucleus reveals 9 genes that are shared by all of these viruses and 22 more genes that are present in at least three of the four compared viral families. Although orthologous proteins from different viral families typically show weak sequence similarity, because of which some of them have not been identified previously, at least five of the conserved genes appear to be synapomorphies (shared derived characters) that unite these four viral families, to the exclusion of all other known viruses and cellular life forms. Cladistic analysis with the genes shared by at least two viral families as evolutionary characters supports the monophyly of poxviruses, asfarviruses, iridoviruses, and phycodnaviruses. The results of genome comparison allow a tentative reconstruction of the ancestral viral genome and suggest that the common ancestor of all of these viral families was a nucleocytoplasmic virus with an icosahedral capsid, which encoded complex systems for DNA replication and transcription, a redox protein involved in disulfide bond formation in virion membrane proteins, and probably inhibitors of apoptosis. The conservation of the disulfide-oxidoreductase, a major capsid protein, and two virion membrane proteins indicates that the odd-shaped virions of poxviruses have evolved from the more common icosahedral virion seen in asfarviruses, iridoviruses, and phycodnaviruses.
The comparative genomics of apicomplexans, such as the malarial parasite Plasmodium, the cattle parasite Theileria and the emerging human parasite Cryptosporidium, have suggested an unexpected paucity of specific transcription factors (TFs) with DNA binding domains that are closely related to those found in the major families of TFs from other eukaryotes. This apparent lack of specific TFs is paradoxical, given that the apicomplexans show a complex developmental cycle in one or more hosts and a reproducible pattern of differential gene expression in course of this cycle. Using sensitive sequence profile searches, we show that the apicomplexans possess a lineage-specific expansion of a novel family of proteins with a version of the AP2 (Apetala2)-integrase DNA binding domain, which is present in numerous plant TFs. About 20–27 members of this apicomplexan AP2 (ApiAP2) family are encoded in different apicomplexan genomes, with each protein containing one to four copies of the AP2 DNA binding domain. Using gene expression data from Plasmodium falciparum, we show that guilds of ApiAP2 genes are expressed in different stages of intraerythrocytic development. By analogy to the plant AP2 proteins and based on the expression patterns, we predict that the ApiAP2 proteins are likely to function as previously unknown specific TFs in the apicomplexans and regulate the progression of their developmental cycle. In addition to the ApiAP2 family, we also identified two other novel families of AP2 DNA binding domains in bacteria and transposons. Using structure similarity searches, we also identified divergent versions of the AP2-integrase DNA binding domain fold in the DNA binding region of the PI-SceI homing endonuclease and the C-terminal domain of the pleckstrin homology (PH) domain-like modules of eukaryotes. Integrating these findings, we present a reconstruction of the evolutionary scenario of the AP2-integrase DNA binding domain fold, which suggests that it underwent multiple independent combinations with different types of mobile endonucleases or recombinases. It appears that the eukaryotic versions have emerged from versions of the domain associated with mobile elements, followed by independent lineage-specific expansions, which accompanied their recruitment to transcription regulation functions.
The helix-turn-helix (HTH) domain is a common denominator in basal and specific transcription factors from the three super-kingdoms of life. At its core, the domain comprises of an open tri-helical bundle, which typically binds DNA with the 3rd helix. Drawing on the wealth of data that has accumulated over two decades since the discovery of the domain, we present an overview of the natural history of the HTH domain from the viewpoint of structural analysis and comparative genomics. In structural terms, the HTH domains have developed several elaborations on the basic 3-helical core, such as the tetra-helical bundle, the winged-helix and the ribbon-helix-helix type configurations. In functional terms, the HTH domains are present in the most prevalent transcription factors of all prokaryotic genomes and some eukaryotic genomes. They have been recruited to a wide range of functions beyond transcription regulation, which include DNA repair and replication, RNA metabolism and protein-protein interactions in diverse signaling contexts. Beyond their basic role in mediating macromolecular interactions, the HTH domains have also been incorporated into the catalytic domains of diverse enzymes. We discuss the general domain architectural themes that have arisen amongst the HTH domains as a result of their recruitment to these diverse functions. We present a natural classification, higher-order relationships and phyletic pattern analysis of all the major families of HTH domains. This reconstruction suggests that there were at least 6-11 different HTH domains in the last universal common ancestor of all life forms, which covered much of the structural diversity and part of the functional versatility of the extant representatives of this domain. In prokaryotes the total number of HTH domains per genome shows a strong power-equation type scaling with the gene number per genome. However, the HTH domains in two-component signaling pathways show a linear scaling with gene number, in contrast to the non-linear scaling of HTH domains in single-component systems and sigma factors. These observations point to distinct evolutionary forces in the emergence of different signaling systems with HTH transcription factors. The archaea and bacteria share a number of ancient families of specific HTH transcription factors. However, they do not share any orthologous HTH proteins in the basal transcription apparatus. This differential relationship of their basal and specific transcriptional machinery poses an apparent conundrum regarding the origins of their transcription apparatus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.