9Distinct patterns of dinucleotide representation, such as CpG and UpA suppression, are characteristic 10 of certain viral genomes. Recent research has uncovered vertebrate immune mechanisms that select 11 against specific dinucleotides in targeted viruses. This evidence highlights the importance of 12 systematically examining the dinucleotide composition of viral genomes. We have developed a novel 13 metric, called Synonymous Dinucleotide Usage (SDU), for quantifying dinucleotide representation in 14 coding sequences. Our method compares the abundance of a given dinucleotide to the null 15 hypothesis of equal synonymous codon usage in the sequence. We present a Python3 package,
16DinuQ, for calculating SDU and other relevant metrics. We have applied this method on two sets of 17 invertebrate-and vertebrate-specific flaviviruses and rhabdoviruses. The SDU shows that the 18 vertebrate viruses exhibit consistently greater under-representation of CpG dinucleotides in all three 19 codon positions in both datasets. In comparison to existing metrics for dinucleotide quantification, the 20 SDU allows for a biologically informed interpretation of its values by comparing representation to an 21 expectation based on the codon table. Here we apply the method to viruses, but coding sequences 22 of other living organisms can be analysed in the same way. 23 24 Introduction 25 Certain dinucleotides, two nucleotides adjacent in a sequence, are known to be over-or under-26 represented in the genomes of living organisms, creating distinct compositional patterns [1].
27Organisms with methylated genomes such as vertebrates and plants have low levels of CpG 28 dinucleotides. This is not the case for methylase-absent organisms like invertebrates, bacteria and 29 fungi [2,3]. The cause of CpG suppression lies in the DNA methylation mechanisms of vertebrates 30 and plants, where cytosine is frequently converted to thymine by DNA methyltransferases [4,5]. UpA 31 deprivation is consistently present in most living organisms, including prokaryotes. This bias is 32 suspected to be due to UpA-rich mRNA being unstable and more prone to degradation by cytoplasmic 33 RNAses [6,7].34 Similar patterns of dinucleotide composition have been observed in RNA and DNA viruses, and 35 appear to have a functional role in the infection and propagation of the viruses [8]. Studies have 36 shown that experimentally increasing UpA and CpG levels in RNA viruses leads to a decrease in 37 replication and subsequent viral attenuation, while decreasing their abundance has the opposite 38 effect [9-11]. Influenza CpG-/UpA-rich sequences have also been shown to have reduced replication 39 in vivo, causing a more powerful immune response, and in the case of CpG increase, showing 40 reduced clinical severity [12]. A similar decrease in replication and virulence has been observed in 41 CpG-/UpA-increased yellow fever virus [13].
42Recently, a host immune mechanism of vertebrates which acts on viral genomes on the dinucleotide 43 level, has been uncovered. The Zinc-finge...