Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.
We recently used in situ Hi-C to create kilobase-resolution 3D maps of mammalian genomes. Here, we combine these maps with new Hi-C, microscopy, and genome-editing experiments to study the physical structure of chromatin fibers, domains, and loops. We find that the observed contact domains are inconsistent with the equilibrium state for an ordinary condensed polymer. Combining Hi-C data and novel mathematical theorems, we show that contact domains are also not consistent with a fractal globule. Instead, we use physical simulations to study two models of genome folding. In one, intermonomer attraction during polymer condensation leads to formation of an anisotropic "tension globule." In the other, CCCTC-binding factor (CTCF) and cohesin act together to extrude unknotted loops during interphase. Both models are consistent with the observed contact domains and with the observation that contact domains tend to form inside loops. However, the extrusion model explains a far wider array of observations, such as why loops tend not to overlap and why the CTCF-binding motifs at pairs of loop anchors lie in the convergent orientation. Finally, we perform 13 genome-editing experiments examining the effect of altering CTCF-binding sites on chromatin folding. The convergent rule correctly predicts the affected loops in every case. Moreover, the extrusion model accurately predicts in silico the 3D maps resulting from each experiment using only the location of CTCF-binding sites in the WT. Thus, we show that it is possible to disrupt, restore, and move loops and domains using targeted mutations as small as a single base pair.genome architecture | molecular dynamics | CTCF | chromatin loops | CRISPR
Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA “baits” to “fish” targets out of a “pond” of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that ~60% of target bases in the exonic “catch”, and ~80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.
Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here, we describe a massively parallel reporter assay (MPRA) that enables systematic dissection of transcriptional regulatory elements by integrating microarray-based DNA synthesis and high-throughput tag sequencing. We apply MPRA to compare more than 27,000 distinct variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon beta enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to identify novel enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.
Summary Genome-wide association studies (GWAS) have successfully identified thousands of associations between common genetic variants and human disease phenotypes, but the majority of these variants are non-coding, often requiring genetic fine-mapping, epigenomic profiling, and individual reporter assays to delineate potential causal variants. We employ a massively parallel reporter assay (MPRA) to simultaneous screen 2756 variants in strong linkage-disequilibrium with 75 sentinel variants associated with red blood cell traits. We show that this assay identifies elements with endogenous erythroid regulatory activity. Across 23 sentinel variants, we conservatively identified 32 MPRA functional variants (MFVs). We demonstrate endogenous enhancer activity across 3 MFVs that predominantly affect the transcription of SMIM1, RBM38, and CD164 using targeted genome editing. Functional follow up of RBM38 delineates a key role for this gene in the alternative splicing program occurring during terminal erythropoiesis. Finally, we provide evidence for how common GWAS-nominated variants can disrupt cell-type specific transcriptional regulatory pathways.
Pneumocystis jirovecii is a major cause of life-threatening pneumonia in immunosuppressed patients including transplant recipients and those with HIV/AIDS, yet surprisingly little is known about the biology of this fungal pathogen. Here we report near complete genome assemblies for three Pneumocystis species that infect humans, rats and mice. Pneumocystis genomes are highly compact relative to other fungi, with substantial reductions of ribosomal RNA genes, transporters, transcription factors and many metabolic pathways, but contain expansions of surface proteins, especially a unique and complex surface glycoprotein superfamily, as well as proteases and RNA processing proteins. Unexpectedly, the key fungal cell wall components chitin and outer chain N-mannans are absent, based on genome content and experimental validation. Our findings suggest that Pneumocystis has developed unique mechanisms of adaptation to life exclusively in mammalian hosts, including dependence on the lungs for gas and nutrients and highly efficient strategies to escape both host innate and acquired immune defenses.
Genome-wide chromatin annotations have permitted the mapping of putative regulatory elements across multiple human cell types. However, their experimental dissection by directed regulatory motif disruption has remained unfeasible at the genome scale. Here, we use a massively parallel reporter assay (MPRA) to measure the transcriptional levels induced by 145-bp DNA segments centered on evolutionarily conserved regulatory motif instances within enhancer chromatin states. We select five predicted activators (HNF1, HNF4, FOXA, GATA, NFE2L2) and two predicted repressors (GFI1, ZFP161) and measure reporter expression in erythroleukemia (K562) and liver carcinoma (HepG2) cell lines. We test 2104 wild-type sequences and 3314 engineered enhancer variants containing targeted motif disruptions, each using 10 barcode tags and two replicates. The resulting data strongly confirm the enhancer activity and cell-type specificity of enhancer chromatin states, the ability of 145-bp segments to recapitulate both, the necessary role of regulatory motifs in enhancer function, and the complementary roles of activator and repressor motifs. We find statistically robust evidence that (1) disrupting the predicted activator motifs abolishes enhancer function, while silent or motif-improving changes maintain enhancer activity; (2) evolutionary conservation, nucleosome exclusion, binding of other factors, and strength of the motif match are predictive of enhancer activity; (3) scrambling repressor motifs leads to aberrant reporter expression in cell lines where the enhancers are usually inactive. Our results suggest a general strategy for deciphering cis-regulatory elements by systematic large-scale manipulation and provide quantitative enhancer activity measurements across thousands of constructs that can be mined to develop predictive models of gene expression.
Plasmodium vivax is a major public health burden, responsible for the majority of malaria infections outside Africa. We explored the impact of demographic history and selective pressures on the P. vivax genome by sequencing 182 clinical isolates sampled from 11 countries across the globe, using hybrid selection to overcome human DNA contamination. We confirmed previous reports of high genomic diversity in P. vivax relative to the more virulent Plasmodium falciparum species; regional populations of P. vivax exhibited greater diversity than the global P. falciparum population, indicating a large and/or stable population. Signals of natural selection suggest that P. vivax is evolving in response to antimalarial drugs and is adapting to regional differences in the human host and the mosquito vector. These findings underline the variable epidemiology of this parasite species and highlight the breadth of approaches that may be required to eliminate P. vivax globally.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.