Essential features of cancer tissue cellular heterogeneity such as negatively selected genome topologies, sub-clonal mutation patterns and genome replication states can only effectively be studied by sequencing single-cell genomes at scale and high fidelity. Using an amplification-free single-cell genome sequencing approach implemented on commodity hardware (DLP+) coupled with a cloud-based computational platform, we define a resource of 40,000 single-cell genomes characterized by their genome states, across a wide range of tissue types and conditions. We show that shallow sequencing across thousands of genomes permits reconstruction of clonal genomes to single nucleotide resolution through aggregation analysis of cells sharing higher order genome structure. From large-scale population analysis over thousands of cells, we identify rare cells exhibiting mitotic mis-segregation of whole chromosomes. We observe that tissue derived scWGS libraries exhibit lower rates of whole chromosome anueploidy than cell lines, and loss of p53 results in a shift in event type, but not overall prevalence in breast epithelium. Finally, we demonstrate that the replication states of genomes can be identified, allowing the number and proportion of replicating cells, as well as the chromosomal pattern of replication to be unambiguously identified in single-cell genome sequencing experiments.The combined annotated resource and approach provide a re-implementable large scale platform for studying lineages and tissue heterogeneity.
Biological and physical determinants of high quality DLP+ library constructionWe initially applied the same reaction conditions from microfluidic DLP (Zahn et al., 2017) to establish amplificationfree single cell WGS in open arrays (DLP+). However this resulted in many poor quality libraries measured by a high proportion of: i) alignments for which interpretable, integer state copy number profiles could not be inferred and ii) failed libraries where coverage was low or absent (Figure 2a, c ii: 1 nL G2 buffer). We therefore sought to quantitatively establish the physical reaction determinants of high quality libraries (e.g. Figure S4b), based on the computed quality score from the classifier. We systematically varied and evaluated several factors: cell lysis volume and buffer type; transposase (Tn5) concentration; post-indexing PCR cycles; cell lysis/DNA solubilisation time; and cell viability state.We observed the following properties as determinants of high quality libraries. Cell lysis volume and buffer type exhibited a combinatorial effect, whereby increased volumes to avoid meniscus effects and evaporation required specific buffers that could be diluted in one pot reactions without impacting subsequent reactions (Figure 2c ii).As expected, increasing transposase (Tn5) concentrations increased library success (Figure 2c iii), but with a tradeoff of increasing bias in sequence GC representation (Figure 2e) and consequently genome coverage (Figure 2f).Similarly, we observed that an increase in post-indexing PCR...