The decrease in sequencing cost and increased sophistication of assembly algorithms for short-read platforms has resulted in a sharp increase in the number of species with genome assemblies. However, these assemblies are highly fragmented, with many gaps, ambiguities, and errors, impeding downstream applications. We demonstrate current state of the art for de novo assembly using the domestic goat (Capra hircus), based on long reads for contig formation, short reads for consensus validation, and scaffolding by optical and chromatin interaction mapping. These combined technologies produced the most continuous de novo mammalian assembly to date, with chromosome-length scaffolds and only 649 gaps. Our assembly represents a ~400-fold improvement in continuity due to properly assembled gaps compared to the previously published C. hircus assembly, and better resolves repetitive structures longer than 1 kb, representing the largest repeat family and immune gene complex ever produced for an individual of a ruminant species.
The cow rumen is adapted for the breakdown of plant material into energy and nutrients, a task largely performed by enzymes encoded by the rumen microbiome. Here we present 913 draft bacterial and archaeal genomes assembled from over 800 Gb of rumen metagenomic sequence data derived from 43 Scottish cattle, using both metagenomic binning and Hi-C-based proximity-guided assembly. Most of these genomes represent previously unsequenced strains and species. The draft genomes contain over 69,000 proteins predicted to be involved in carbohydrate metabolism, over 90% of which do not have a good match in public databases. Inclusion of the 913 genomes presented here improves metagenomic read classification by sevenfold against our own data, and by fivefold against other publicly available rumen datasets. Thus, our dataset substantially improves the coverage of rumen microbial genomes in the public databases and represents a valuable resource for biomass-degrading enzyme discovery and studies of the rumen microbiome.
Mcm4 (minichromosome maintenance-deficient 4 homolog) encodes a subunit of the MCM2-7 complex (also known as MCM2-MCM7), the replication licensing factor and presumptive replicative helicase. Here, we report that the mouse chromosome instability mutation Chaos3 (chromosome aberrations occurring spontaneously 3), isolated in a forward genetic screen, is a viable allele of Mcm4. Mcm4(Chaos3) encodes a change in an evolutionarily invariant amino acid (F345I), producing an apparently destabilized MCM4. Saccharomyces cerevisiae strains that we engineered to contain a corresponding allele (resulting in an F391I change) showed a classical minichromosome loss phenotype. Whereas homozygosity for a disrupted Mcm4 allele (Mcm4(-)) caused preimplantation lethality, Mcm(Chaos3/-) embryos died late in gestation, indicating that Mcm4(Chaos3) is hypomorphic. Mutant embryonic fibroblasts were highly susceptible to chromosome breaks induced by the DNA replication inhibitor aphidicolin. Most notably, >80% of Mcm4(Chaos3/Chaos3) females succumbed to mammary adenocarcinomas with a mean latency of 12 months. These findings suggest that hypomorphic alleles of the genes encoding the subunits of the MCM2-7 complex may increase breast cancer risk.
Microbial communities consist of mixed populations of organisms, including unknown species in unknown abundances. These communities are often studied through metagenomic shotgun sequencing, but standard library construction methods remove long-range contiguity information; thus, shotgun sequencing and de novo assembly of a metagenome typically yield a collection of contigs that cannot readily be grouped by species. Methods for generating chromatin-level contact probability maps, e.g., as generated by the Hi-C method, provide a signal of contiguity that is completely intracellular and contains both intrachromosomal and interchromosomal information. Here, we demonstrate how this signal can be exploited to reconstruct the individual genomes of microbial species present within a mixed sample. We apply this approach to two synthetic metagenome samples, successfully clustering the genome content of fungal, bacterial, and archaeal species with more than 99% agreement with published reference genomes. We also show that the Hi-C signal can secondarily be used to create scaffolded genome assemblies of individual eukaryotic species present within the microbial community, with higher levels of contiguity than some of the species’ published reference genomes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.