Background: Haploid cell lines are a valuable research tool with broad applicability for genetic assays. As such the fully haploid human cell line, eHAP1, has been used in a wide array of studies. However, the absence of a corresponding reference genome sequence for this cell line has limited the potential for more widespread applications to experiments dependent on available sequence, like capture-clone methodologies.
Results:We generated ~15x coverage Nanopore long reads from ten GridION flowcells. We utilized this data to assemble a de novo draft genome using minimap and miniasm and subsequently polished using Racon. This assembly was further polished using previously generated, low-coverage, Illumina short reads with Pilon and ntEdit. This resulted in a hybrid eHAP1 assembly with >90% complete BUSCO scores. We further assessed the eHAP1 long read data for structural variants using Sniffles and identify a variety of rearrangements, including a previously established Philadelphia translocation. Finally, we demonstrate how some of these variants overlap open chromatin regions, potentially impacting regulatory regions.Conclusions: By integrating both long and short reads, we generated a high-quality reference assembly for eHAP1 cells. We identify structural variants using long reads, including some that may impact putative regulatory elements. The union of long and short reads demonstrates the utility in combining sequencing platforms to generate a high-quality reference genome de novo solely from low coverage data. We expect the resulting eHAP1 genome assembly to provide a useful resource to enable novel experimental applications in this important model cell line.
Introduction:The vast majority of eukaryotic cells are diploid and many cellular models used experimentally are either diploid or polyploid. The presence of additional alleles, while evolutionarily beneficial, 9 Methods: eHAP1 cell culture: eHAP1 cells were purchased from Horizon Discovery (SKU: c669). The cells were cultured using the following growth media: 445 mL IMDM media (Gibco: 12440-053), 50 mL FBS, and 5 mL 100x Pen/Strep. Cells were passaged every 2-3 days at a ratio of 1:5.The cells were rapidly expanded post purchase to reduce the number of passages and possible ploidy changes, prior to genomic DNA isolation.Genomic DNA isolation, library prep, and sequencing: Genomic DNA was harvested from 5 million cells using the Circulomics Nanobind CBB Big DNA kit (Part #NB-900-001-01). The DNA was extracted following the included handbook (v1.7) protocol for "Cultured Mammalian Cells -HMW" with minor modifications. Specifically, cells were vortexed intensively (1 second pulses, 10x pulses), the final DNA was pipetted 10 times through a p200 tip, and immediately prior to library preparation, the DNA was run through a 28G needle five times. This was done to help the DNA into solution with minimal effect on length.The genomic DNA was prepared using the Nanopore Ligation Sequencing Kit (SQK-LSK109) following the manufacturer's protocol (GDE_9063_v109_revD...