17Pigs (Sus scrofa) exhibit diverse phenotypes in different breeds shaped by the 18 combined effects of various local adaptation and artificial selection. To 19 comprehensively characterize the genetic diversity of pigs, we construct a pig pan-20 genome by comparing genome assemblies of 11 representative pig breeds with the 21 reference genome (Sscrofa11.1). Approximately 72.5 Mb non-redundant sequences 22 were identified as pan-sequences which were absent from the Sscrofa11.1. On 23 average, 41.7 kb of spurious heterozygous SNPs per individual are removed and 12.9 24 kb novel SNPs per individual are recovered using pan-genome as the reference for 25 SNP calling, thereby providing enhanced resolution for genetic diversity in pigs.
26Homolog annotation and analysis using RNA-seq and Hi-C data indicate that these 27 pan-sequences contain protein-coding regions and regulatory elements. These pan-28 sequences can further improve the interpretation of local 3D structure. The pan-29 genome as well as the accompanied web-based database will serve as a primary 30 resource for exploration of genetic diversity and promote pig breeding and biomedical 31 research. 32 65level.
66Here we carried out an in-depth comparison between 11 de novo assemblies 67 and the reference genome by analysis of the assembly-versus-assembly alignment.
68The final pan-genome comprises 39,744 (total length: 72.5 Mb) newly added 69 sequences and of which 607 demonstrate coding potential. Furthermore, the three-70 dimensional (3D) spatial structure of pan-genome was depicted by revealing the 71 characteristics of pan-genome in A/B compartment (generally euchromatic and 72 heterochromatic regions) and topologically associating domain (TAD). We also build 73 a pig pan-genome database (PIGPAN, 74 http://animal.nwsuaf.edu.cn/code/index.php/panPig) which can serve as a 75 fundamental resource for unlocking variations within diverse pig breeds.76 5
Results
77Initial characterization of pan-sequences in the pig genome 78 To construct the pig pan-genome, we first aligned 11 assemblies from 11 genetically 79 distinct breeds (five from Europe, and six from China) against Sscrofa11.1 using 80 BLASTN to generate the unaligned sequences ( Fig. 1a and Supplementary Fig. 2). 81 The length of the unaligned sequences in the Chinese pigs was significantly longer 82 than those in the European pigs (P <0.01) since the reference genome is from a 83 European pig (Fig. 1a). As expected, the Wuzhishan assembly had the largest number 84 of sequences because this sample is the only male individual among the 11 assemblies 85 and can provide many male-specific sequences ( Fig. 1a and Supplementary Table 86 2). After removing redundant sequences, we obtained 39,744 sequences with a total 87 length of 72.5 Mb (Fig. 1b), which were absent from Sscrofa11.1 and thus were 88 defined as pan-sequences. The content of the repetitive elements (45.91%) and GC 89 (44.61%) in these sequences were slightly higher than those in Sscrofa11.1 (45.19% 90 and 41.5%, respectively) ( Fig. 1...