Genomes assembled de novo from short reads are highly fragmented relative to the finished chromosomes of H. sapiens and key model organisms generated by the Human Genome Project. To address this, we need scalable, cost-effective methods enabling chromosome-scale contiguity. Here we show that genome-wide chromatin interaction datasets, such as those generated by Hi-C, are a rich source of long-range information for assigning, ordering and orienting genomic sequences to chromosomes, including across centromeres. To exploit this, we developed an algorithm that uses Hi-C data for ultra-long-range scaffolding of de novo genome assemblies. We demonstrate the approach by combining shotgun fragment and short jump mate-pair sequences with Hi-C data to generate chromosome-scale de novo assemblies of the human, mouse and Drosophila genomes, achieving – for human – 98% accuracy in assigning scaffolds to chromosome groups and 99% accuracy in ordering and orienting scaffolds within chromosome groups. Hi-C data can also be used to validate chromosomal translocations in cancer genomes.
Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly, full-length cDNA sequencing with a multi-platform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. Comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single base-pair to megabase-sized variants. We identified ~17 thousand fixed human-specific structural variants identifying genic and putative regulatory changes that emerged in humans since divergence from nonhuman apes. Interestingly, these fixed human-specific structural variants are enriched near genes that are downregulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.
SummaryThe HeLa cell line was established in 1951 from cervical cancer cells taken from a patient, Henrietta Lacks, marking the first successful attempt to continually culture human-derived cells in vitro1. HeLa’s robust growth and unrestricted distribution resulted in its broad adoption – both intentionally and through widespread cross-contamination2 – and for the past sixty years it has served a role analogous to that of a model organism3. Its cumulative impact is illustrated by the fact that HeLa is named in >74,000 or ~0.3% of PubMed abstracts. The genomic architecture of HeLa remains largely unexplored beyond its karyotype4, in part because like many cancers, its extensive aneuploidy renders such analyses challenging. We performed haplotype-resolved whole genome sequencing5 of the HeLa CCL-2 strain, discovering point and indel variation, mapping copy-number and loss of heterozygosity (LOH), and phasing variants across full chromosome arms. We further investigated variation and copy-number profiles for HeLa S3 and eight additional strains. Surprisingly, HeLa is relatively stable with respect to point variation, accumulating few new mutations since early passaging. Haplotype resolution facilitated reconstruction of an amplified, highly rearranged region at chromosome 8q24.21 at which the HPV-18 viral genome integrated as the likely initial event underlying tumorigenesis. We combined these maps with RNA-Seq6 and ENCODE Project7 datasets to phase the HeLa epigenome, revealing strong, haplotype-specific activation of the proto-oncogene MYC by the integrated HPV-18 genome ~500 kilobases upstream, and permitting global analyses of the relationship between gene dosage and expression. These data provide an extensively phased, high-quality reference genome for past and future experiments relying on HeLa, and demonstrate the value of haplotype resolution for characterizing cancer genomes and epigenomes.
We present single-cell combinatorial indexed Hi-C (sciHi-C), which applies the concept of combinatorial cellular indexing to chromosome conformation capture. In this proof-of-concept, we generate and sequence six sciHi-C libraries comprising a total of 10,696 single cells. We use sciHi-C data to separate cells by karytoypic and cell-cycle state differences and identify cell-to-cell heterogeneity in mammalian chromosomal conformation. Our results demonstrate that combinatorial indexing is a generalizable strategy for single-cell genomics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.