Eukaryotic genomes must accomplish the tradeoff between compact packaging for genome stability and inheritance, and accessibility for gene expression. They do so using post-translational modifications of four ancient canonical histone proteins (H2A, H2B, H3 and H4), and by deploying histone variants with specialized chromatin functions. While some histone variants are highly conserved across eukaryotes, others carry out lineage-specific functions. Here, we characterize the evolution of male germline-specific "short H2A variants", which wrap shorter DNA fragments than canonical H2A. In addition to three previously described H2A.B, H2A.L and H2A.P variants, we describe a novel, extremely short H2A histone variant: H2A.Q. We show that H2A.B, H2A.L, H2A.P and H2A.Q are most closely related to a novel, more canonical mmH2A variant found only in monotremes and marsupials. Using phylogenomics, we trace the origins and early diversification of short histone variants into four distinct clades to the ancestral X chromosome of placental mammals. We show that short H2A variants further diversified by repeated lineage-specific amplifications and losses, including pseudogenization of H2A.L in many primates. We also uncover evidence for concerted evolution of H2A.B and H2A.L genes by gene conversion in many species, involving loci separated by large distances. Finally, we find that short H2As evolve more rapidly than any other histone variant, with evidence that positive selection has acted upon H2A.P in primates. Based on their X chromosomal location and pattern of genetic innovation, we speculate that short H2A histone variants are engaged in a form of genetic conflict involving the mammalian sex chromosomes.
IntroductionNucleosomes are the basic unit of chromatin in practically all eukaryotes. A typical nucleosome particle wraps 150bp of DNA around an octamer of four histone proteins: H3, H4, H2A and H2B (Kornberg 1974;Kornberg and Thomas 1974;Malik and Henikoff 2003). These four canonical "core" histone proteins have a stereotypical structure characterized by alpha-helices comprising a "histone fold domain" (HFD) (Luger, et al. 1997). During nucleosome assembly, the HFD mediates dimerization of H3 with H4, and that of H2A with H2B, and contains the nucleosome-DNA interface. Nucleosomes thus comprise a central core of a H3-H4 tetramer, flanked by two H2A-H2B dimers. In contrast to their HFD, the N-and C-terminal tails of canonical histones are far less structured and remain solvent-exposed in the nucleosome.. CC-BY-NC 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/165936 doi: bioRxiv preprint first posted online Jul. 20, 2017; 3 Nucleosomes can prevent other cellular factors, such as transcription factors, from interacting with DNA. While the post-translational modification of histone tails play an important role in regulating these interactions, eukaryotic genomes also establish diverse chroma...