BackgroundHigh-throughput DNA sequencing technologies are generating vast amounts of data. Fast, flexible and memory efficient implementations are needed in order to facilitate analyses of thousands of samples simultaneously.ResultsWe present a multithreaded program suite called ANGSD. This program can calculate various summary statistics, and perform association mapping and population genetic analyses utilizing the full information in next generation sequencing data by working directly on the raw sequencing data or by using genotype likelihoods.ConclusionsThe open source c/c++ program ANGSD is available at http://www.popgen.dk/angsd. The program is tested and validated on GNU/Linux systems. The program facilitates multiple input formats including BAM and imputed beagle genotype probability files. The program allow the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-014-0356-4) contains supplementary material, which is available to authorized users.
Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18X per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from EPAS1, a transcription factor involved in response to hypoxia. One SNP at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication.
Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual's ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequenc-ing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.
How and when the Americas were populated remains contentious. Using ancient and modern genome-wide data, we find that the ancestors of all present-day Native Americans, including Athabascans and Amerindians, entered the Americas as a single migration wave from Siberia no earlier than 23 thousand years ago (KYA), and after no more than 8,000-year isolation period in Beringia. Following their arrival to the Americas, ancestral Native Americans diversified into two basal genetic branches around 13 KYA, one that is now dispersed across North and South America and the other is restricted to North America. Subsequent gene flow resulted in some Native Americans sharing ancestry with present-day East Asians (including Siberians) and, more distantly, Australo-Melanesians. Putative ‘Paleoamerican’ relict populations, including the historical Mexican Pericúes and South American Fuego-Patagonians, are not directly related to modern Australo-Melanesians as suggested by the Paleoamerican Model.
It is commonly thought that human genetic diversity in non-African populations was shaped primarily by an out-of-Africa dispersal 50–100 thousand yr ago (kya). Here, we present a study of 456 geographically diverse high-coverage Y chromosome sequences, including 299 newly reported samples. Applying ancient DNA calibration, we date the Y-chromosomal most recent common ancestor (MRCA) in Africa at 254 (95% CI 192–307) kya and detect a cluster of major non-African founder haplogroups in a narrow time interval at 47–52 kya, consistent with a rapid initial colonization model of Eurasia and Oceania after the out-of-Africa bottleneck. In contrast to demographic reconstructions based on mtDNA, we infer a second strong bottleneck in Y-chromosome lineages dating to the last 10 ky. We hypothesize that this bottleneck is caused by cultural changes affecting variance of reproductive success among males.
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.