The incomplete identification of structural variants from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-and short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three human parent-child trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,181 indel variants (<50 bp) and 31,599 structural variants (≥50 bp) per human genome, a sevenfold increase in structural variation compared to previous reports, including from the 1000 Genomes Project. We also discovered 156 inversions per genome-most of which previously escaped detection-as well as large unbalanced chromosomal rearrangements. We provide nearcomplete, haplotype-resolved structural variation for three genomes that can now be used as a gold standard for the scientific community and we make specific recommendations for maximizing structural variation sensitivity for future large-scale genome sequencing studies.
INTRODUCTIONStructural variants (SVs) contribute greater diversity at the nucleotide level between two human genomes than any other form of genetic variation (Conrad et al. 2010;Kidd et al. 2010;Korbel et al. 2007;Sudmant et al. 2015). To date, such variation has been difficult to identify and characterize from the large number of human genomes that have been sequenced using shortread, high-throughput sequencing technologies. The methods to detect SVs in these datasets are dependent, in part, on indirect inferences (e.g., read-depth and discordant read-pair mapping). The limited number of SVs observed directly using split-read approaches (Rausch et al. 2012;Kronenberg et al. 2015;Ye et al. 2009) is constrained by the short length of these sequencing reads. Moreover, while larger copy number variants (CNVs) could be identified using microarray and read-depth approaches, smaller events (<5 kbp) and balanced events, such as inversions, remain poorly ascertained .