Next-generation sequencing technologies have revolutionized the field of paleogenomics, allowing the reconstruction of complete ancient genomes and their comparison with modern references. However, this requires the processing of vast amounts of data and involves a large number of steps that use a variety of computational tools. Here we present PALEOMIX (http://geogenetics.ku.dk/publications/paleomix), a flexible and user-friendly pipeline applicable to both modern and ancient genomes, which largely automates the in silico analyses behind whole-genome resequencing. Starting with next-generation sequencing reads, PALEOMIX carries out adapter removal, mapping against reference genomes, PCR duplicate removal, characterization of and compensation for postmortem damage, SNP calling and maximum-likelihood phylogenomic inference, and it profiles the metagenomic contents of the samples. As such, PALEOMIX allows for a series of potential applications in paleogenomics, comparative genomics and metagenomics. Applying the PALEOMIX pipeline to the three ancient and seven modern Phytophthora infestans genomes as described here takes 5 d using a 16-core server.
Through domestication, humans have substantially altered the morphology of Zea mays ssp. parviglumis (teosinte) into the currently recognizable maize. This system serves as a model for studying adaptation, genome evolution, and the genetics and evolution of complex traits. To examine how domestication has reshaped the transcriptome of maize seedlings, we used expression profiling of 18,242 genes for 38 diverse maize genotypes and 24 teosinte genotypes. We detected evidence for more than 600 genes having significantly different expression levels in maize compared with teosinte. Moreover, more than 1,100 genes showed significantly altered coexpression profiles, reflective of substantial rewiring of the transcriptome since domestication. The genes with altered expression show a significant enrichment for genes previously identified through population genetic analyses as likely targets of selection during maize domestication and improvement; 46 genes previously identified as putative targets of selection also exhibit altered expression levels and coexpression relationships. We also identified 45 genes with altered, primarily higher, expression in inbred relative to outcrossed teosinte. These genes are enriched for functions related to biotic stress and may reflect responses to the effects of inbreeding. This study not only documents alterations in the maize transcriptome following domestication, identifying several genes that may have contributed to the evolution of maize, but highlights the complementary information that can be gained by combining gene expression with population genetic analyses.
The Y chromosome directly reflects male genealogies, but the extremely low Y chromosome sequence diversity in horses has prevented the reconstruction of stallion genealogies [1, 2]. Here, we resolve the first Y chromosome genealogy of modern horses by screening 1.46 Mb of the male-specific region of the Y chromosome (MSY) in 52 horses from 21 breeds. Based on highly accurate pedigree data, we estimated the de novo mutation rate of the horse MSY and showed that various modern horse Y chromosome lineages split much later than the domestication of the species. Apart from few private northern European haplotypes, all modern horse breeds clustered together in a roughly 700-year-old haplogroup that was transmitted to Europe by the import of Oriental stallions. The Oriental horse group consisted of two major subclades: the Original Arabian lineage and the Turkoman horse lineage. We show that the English Thoroughbred MSY was derived from the Turkoman lineage and that English Thoroughbred sires are largely responsible for the predominance of this haplotype in modern horses.
Genome-wide association studies (GWAS) have identified loci linked to hundreds of traits in many different species. Yet, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in nonhuman, nonmodel species, where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes. We developed a computational approach, Camoco, that integrates loci identified by GWAS with functional information derived from gene coexpression networks. Using Camoco, we prioritized candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize (Zea mays) seeds. Strikingly, we observed a strong dependence in the performance of our approach based on the type of coexpression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, roots that are the primary elemental uptake and delivery system) outperformed other alternative networks. Two candidate genes identified by our approach were validated using mutants. Our study demonstrates that coexpression networks provide a powerful basis for prioritizing candidate causal genes from GWAS loci but suggests that the success of such strategies can highly depend on the gene expression data context. Both the software and the lessons on integrating GWAS data with coexpression networks generalize to species beyond maize.
Analysis of the Y chromosome is the best-established way to reconstruct paternal family history in humans. Here, we applied fine-scaled Y-chromosomal haplotyping in horses with biallelic markers and demonstrate the potential of our approach to address the ancestry of sire lines. We de novo assembled a draft reference of the male-specific region of the Y chromosome from Illumina short reads and then screened 5.8 million basepairs for variants in 130 specimens from intensively selected and rural breeds and nine Przewalski’s horses. Among domestic horses we confirmed the predominance of a young’crown haplogroup’ in Central European and North American breeds. Within the crown, we distinguished 58 haplotypes based on 211 variants, forming three major haplogroups. In addition to two previously characterised haplogroups, one observed in Arabian/Coldblooded and the other in Turkoman/Thoroughbred horses, we uncovered a third haplogroup containing Iberian lines and a North African Barb Horse. In a genealogical showcase, we distinguished the patrilines of the three English Thoroughbred founder stallions and resolved a historic controversy over the parentage of the horse ‘Galopin’, born in 1872. We observed two nearly instantaneous radiations in the history of Central and Northern European Y-chromosomal lineages that both occurred after domestication 5,500 years ago.
BackgroundTo date, genome-scale analyses in the domestic horse have been limited by suboptimal single nucleotide polymorphism (SNP) density and uneven genomic coverage of the current SNP genotyping arrays. The recent availability of whole genome sequences has created the opportunity to develop a next generation, high-density equine SNP array.ResultsUsing whole genome sequence from 153 individuals representing 24 distinct breeds collated by the equine genomics community, we cataloged over 23 million de novo discovered genetic variants. Leveraging genotype data from individuals with both whole genome sequence, and genotypes from lower-density, legacy SNP arrays, a subset of ~5 million high-quality, high-density array candidate SNPs were selected based on breed representation and uniform spacing across the genome. Considering probe design recommendations from a commercial vendor (Affymetrix, now Thermo Fisher Scientific) a set of ~2 million SNPs were selected for a next-generation high-density SNP chip (MNEc2M). Genotype data were generated using the MNEc2M array from a cohort of 332 horses from 20 breeds and a lower-density array, consisting of ~670 thousand SNPs (MNEc670k), was designed for genotype imputation.ConclusionsHere, we document the steps taken to design both the MNEc2M and MNEc670k arrays, report genomic and technical properties of these genotyping platforms, and demonstrate the imputation capabilities of these tools for the domestic horse.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-3943-8) contains supplementary material, which is available to authorized users.
Landscape genetics is an emerging discipline that utilizes environmental and historical data to understand geographic patterns of genetic diversity. Niche modelling has added a new dimension to such efforts by allowing species-environmental associations to be projected into the past so that hypotheses about historical vicariance can be generated and tested independently with genetic data. However, previous approaches have primarily utilized DNA sequence data to test inferences about historical isolation and may have missed very recent episodes of environmentally mediated divergence. We type 15 microsatellite loci in California mule deer and identify five genetic groupings through a Structure analysis that are also well predicted by environmental data. We project the niches of these five deer ecotypes to the last glacial maximum (LGM) and show they overlap to a much greater extent than today, suggesting that vicariance associated with the LGM cannot explain the present-day genetic patterns. Further, we analyse mitochondrial DNA (mtDNA) sequence trees to search for evidence of historical vicariance and find only two well-supported clades. A coalescence-based analysis of mtDNA data shows that the genetic divergence of the mule deer genetic clusters in California is recent and appears to be mediated by ecological factors. The importance of environmental factors in explaining the genetic diversity of California mule deer is unexpected given that they are highly mobile species and have a broad habitat distribution. Geographic differences in the timing of reproduction and peak vegetation as well as habitat choice reflecting natal origin may explain the persistence of genetic subdivision.
Impaired acrosomal reaction (IAR) of sperm causes male subfertility in humans and animals. Despite compelling evidence about the genetic control over acrosome biogenesis and function, the genomics of IAR is as yet poorly understood, providing no molecular tools for diagnostics. Here we conducted Equine SNP50 Beadchip genotyping and GWAS using 7 IAR–affected and 37 control Thoroughbred stallions. A significant (P<6.75E-08) genotype–phenotype association was found in horse chromosome 13 in FK506 binding protein 6 (FKBP6). The gene belongs to the immunophilins FKBP family known to be involved in meiosis, calcium homeostasis, clathrin-coated vesicles, and membrane fusions. Direct sequencing of FKBP6 exons in cases and controls identified SNPs g.11040315G>A and g.11040379C>A (p.166H>N) in exon 4 that were significantly associated with the IAR phenotype both in the GWAS cohort (n = 44) and in a large multi-breed cohort of 265 horses. All IAR stallions were homozygous for the A-alleles, while this genotype was found only in 2% of controls. The equine FKBP6 was exclusively expressed in testis and sperm and had 5 different transcripts, of which 4 were novel. The expression of this gene in AC/AG heterozygous controls was monoallelic, and we observed a tendency for FKBP6 up-regulation in IAR stallions compared to controls. Because exon 4 SNPs had no effect on the protein structure, it is likely that FKBP6 relates to the IAR phenotype via regulatory or modifying functions. In conclusion, FKBP6 was considered a susceptibility gene of incomplete penetrance for IAR in stallions and a candidate gene for male subfertility in mammals. FKBP6 genotyping is recommended for the detection of IAR–susceptible individuals among potential breeding stallions. Successful use of sperm as a source of DNA and RNA propagates non-invasive sample procurement for fertility genomics in animals and humans.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.