A spectacular anomaly in the 4-mer composition of the giant pandoravirus genomes 4 reveals a stringent new evolutionary selection process 5 6 Running title: Unique compositional anomaly in pandoraviruses 7 8 Abstract 19The Pandoraviridae is a rapidly growing family of giant viruses, all of which have been 20 isolated using laboratory strains of Acanthamoeba. The genomes of ten distinct strains 21 have been fully characterized, reaching up to 2.5 Mb in size. These double-stranded DNA 22 genomes encode the largest of all known viral proteomes and are propagated in oblate 23 virions that are among the largest ever-described (1.2 µm long and 0.5 µm wide). The 24 evolutionary origin of these atypical viruses is the object of numerous speculations. 25 Applying the Chaos Game Representation to the pandoravirus genome sequences, we 26 discovered that the tetranucleotide (4-mer) "AGCT" is totally absent from the genomes of 27 2 strains (P. dulcis and P. quercus) and strongly underrepresented in others. Given the 28 amazingly low probability of such an observation in the corresponding randomized 29 sequences, we investigated its biological significance through a comprehensive study of 30 the 4-mer compositions of all viral genomes. Our results indicate that "AGCT" was 31 specifically eliminated during the evolution of the Pandoraviridae and that none of the 32 previously proposed host-virus antagonistic relationships could explain this phenomenon. 33 Unlike the three other families of giant viruses (Mimiviridae, Pithoviridae, Molliviridae) 34 infecting the same Acanthamoeba host, the pandoraviruses exhibit a puzzling genomic 35 anomaly suggesting a highly specific DNA editing in response to a new kind of strong 36 evolutionary pressure. 37 Importance 38 The recent years have seen the discovery of several families of giant DNA viruses all 39 infecting the ubiquitous amoebozoa of the genus Acanthamoeba. With dsDNA genomes 40 reaching 2.5 Mb in length packaged in oblate particles the size of a bacterium, the 41 3 pandoraviruses are the most complex and largest viruses known as of today. In addition to 42 their spectacular dimensions, the pandoraviruses encode the largest proportion of proteins 43 without homolog in other organisms, thought to result from a de novo gene creation 44 process. While using comparative genomics to investigate the evolutionary forces 45 responsible for the emergence of such an unusual giant virus family, we discovered a 46 unique bias in the tetranucleotide composition of the pandoravirus genomes that can only 47 result from an undescribed evolutionary process not encountered in any other 48 microorganism. 49 50 The Pandoraviruses are among the growing number of families of environmental 51 giant DNA viruses infecting protozoans and isolated using the laboratory host 52 Acanthamoeba (Protozoa/Lobosa/Ameobida/ Acanthamoebidae/ Acanthamoeba) 1-4 . As of 53 today, they exhibit the largest fully characterized viral genomes, made of linear dsDNA 54 molecules from 1.9 to 2.5 Mb in size, predicted...