39Shigellosis is a highly infectious disease that are mainly transmitted via faecal-oral contact of 40 the bacteria Shigella. Four species have been identified in Shigella genus, among which S. 41 flexneri is used to be the most prevalent species globally and commonly isolated from 42 developing countries. However, it is being replaced by S. sonnei that is currently the main 43 causative agent for dysentery pandemic in many emerging industrialized countries such as Asia 44 and the Middle East with unclear reasons. For a better understanding of S. sonnei virulence and 45 antibiotic resistance, we sequenced 12 clinical S. sonnei strains with varied antibiotic-46 resistance profiles collected from four cities in Jiangsu Province, China. Phylogenomic 47 55 56 Shigellosis 58 59 60 61 62 63 64 65 66 67 68 69 70 phenotypes. Sequenced S. sonnei genomes were aligned and compared with the reference strain 94 S. sonnei 53G. The relationship between the antibiotic resistance and virulence were studied 95 by combining antibiotic resistance profiles with the distribution of putative virulence factors. 96Here, what we mean by virulence factors is gene products that enable a microorganism to 97 establish itself on or within a host of a particular species, facilitating its abilities to cause 98 diseases, which are divided into 4 categories and 31 functional groups, such as bacterial toxins, 99 cell surface proteins, and hydrolytic enzymes, etc 8 . All the virulence factors come from 32 100 major bacterial pathogens including the Shigella genus. By screening and comparing the 101 genomes of sensitive and antibiotic resistant Shigella sonnies using this hierarchical set of VF 102 sequence models, we attempted to quantify virulence by the number of virulence factors in 103 specific functional groups. Principal component analysis was then performed to cluster the 104 resistant and sensitive strains, separately, via incorporating both the number of virulence
Pan-and phylo-genomic analysis 126 127The total pan-genome for the 12 S. sonnei strains include 5608 protein CDSs. Of those, 3893 128 (69.42% of total CDSs) are core genes across all 12 species while 1715 (30.58% of total CDSs) 129 constitute the accessory fractions, which are unique to each genome. Strain S13029 has the 130 lowest number of the unique genes (484 CDSs) and S14031 has the highest number of unique 131 genes (803 CDSs) (Supplementary Figure 2). Interestingly, both strains are completely 132 sensitive to or only resistant to one of all tested antibiotics. Further comparison of all the 12 S. 133 sonnei strains give complete map of gene presence and absence in each genome (Dataset 1). 134By comparing sensitive strains with resistant strains, unique genes associated with the two 135 groups were identified and annotated based on sequence homology (Supplementary Table 2), 136 respectively. These genes could serve as a guidance for a better understanding of the 137 differences between the two S. sonnei groups in terms of their virulence and resistance. 278