Accurately determining the distribution of rare variants is an important goal of human genetics, but resequencing of a sample large enough for this purpose has been unfeasible until now. Here, we applied Sanger sequencing of genomic PCR amplicons to resequence the diabetes-associated genes KCNJ11 and HHEX in 13,715 people (10,422 European Americans and 3,293 African Americans) and validated amplicons potentially harbouring rare variants using 454 pyrosequencing. We observed far more variation (expected variant-site count ∼578) than would have been predicted on the basis of earlier surveys, which could only capture the distribution of common variants. By comparison with earlier estimates based on common variants, our model shows a clear genetic signal of accelerating population growth, suggesting that humanity harbours a myriad of rare, deleterious variants, and that disease risk and the burden of disease in contemporary populations may be heavily influenced by the distribution of rare variants.
Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants and provided clear evidence of recent rapid growth in effective population size, although estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants are also impacted by natural selection. In this study, we introduce targeted sequencing data for studying recent human history with minimal confounding by natural selection. We sequenced loci far from genes that meet a wide array of additional criteria such that mutations in these loci are putatively neutral. As population structure also skews allele frequencies, we sequenced 500 individuals of relatively homogeneous ancestry by first analyzing the population structure of 9,716 European Americans. We used very high coverage sequencing to reliably call rare variants and fit an extensive array of models of recent European demographic history to the site frequency spectrum. The best-fit model estimates ∼3.4% growth per generation during the last ∼140 generations, resulting in a population size increase of two orders of magnitude. This model fits the data very well, largely due to our observation that assumptions of more ancient demography can impact estimates of recent growth. This observation and results also shed light on the discrepancy in demographic estimates among recent studies.A rcheological and historical records reveal that modern human populations have experienced dramatic growth, likely driven by the Neolithic revolution about 10,000 y ago (1, 2). Since then, the worldwide human population size has increased at a fast pace, and faster yet in the last ∼2,000 y, giving rise to today's population in excess of 7 billion people (3, 4). A central question in population genetics is how such demographic events affect the effective size (N e ) of populations over time, and as a consequence, how they have shaped extant patterns of genetic variation. [Effective population size, which is typically smaller than the census size, determines the genetic properties of a population (5).] Focusing often on human populations of European descent, estimates of N e from genetic variation have been traditionally on the order of 10,000 individuals (6-11), although higher and lower estimates have also been obtained (12-16). More recent studies based on sequencing data from a relatively small number of individuals have considered recent population growth in fitting models to the observed site frequency spectrum (SFS) and reported as much as a 0.5% increase in N e per generation, culminating in a N e of a few tens of thousands today (13,14). It has been recently hypothesized that these studies could not capture the full scope of population growth because a larger sample size of individuals is needed to observe single nucleotide variants (SNVs) that arose during the recent epoch of growth (4).With extreme recent population growth as experie...
We present a highly accurate method for identifying genes with conserved RNA secondary structure by searching multiple sequence alignments of a large set of candidate orthologs for correlated arrangements of reverse-complementary regions. This approach is growing increasingly feasible as the genomes of ever more organisms are sequenced. A program called MSARI implements this method and is significantly more accurate than existing methods in the context of automatically generated alignments, making it particularly applicable to high-throughput scans. In our tests, it discerned CLUSTALW-generated multiple sequence alignments of signal recognition particle or RNaseP orthologs from controls with 89.1% sensitivity at 97.5% specificity and with 74.4% sensitivity with no false positives in 494 controls. We used MSARI to conduct a comprehensive scan for secondary structure in mRNAs of coding genes, and we found many genes with known mRNA secondary structure and compelling evidence for secondary structure in other genes. MSARI uses a method for coping with sequence redundancy that is likely to have applications in a large set of other comparisonbased search methods. The program is available for download from http:͞͞theory.csail.mit.edu͞MSARi. The structure of RNA is to a large extent determined by cis base pairing (AU, GC, and GU). This base-pairing is referred to as secondary structure. A noncoding RNA (ncRNA) (1) gene expresses RNA that is never translated into protein but is nonetheless biologically significant. Examples of such genes are tRNAs and XIST, which in mammalian males suppresses expression of genes on the X chromosome (2-4). RNA secondary structure in mRNAs can also be biologically significant, controlling timing and localization of protein expression (5). Identifying such secondary structure will be crucial to a complete understanding of cellular biology (6).Most work on identifying RNA secondary structure has been in the context of searching for ncRNA genes. Some approaches to automated identification of ncRNA genes have focused on searching for a recognizable secondary structure associated with RNA transcripts serving a specific biological function. One example of this type of program is Eddy and coworkers' 8), which searches for tRNAs. Others are Regalia et al.'s search for signal recognition particles (9) and Rhoades et al.'s search for microRNAs (10).Automatically identifying novel biologically significant RNA secondary structure has proven to be difficult. By itself, RNA secondary structure in stand-alone genes is not particularly amenable to computer-based recognition methods, as many RNA sequences seem to have thermodynamically plausible secondary structures of no biological relevance (11). Moreover, ncRNA genes cannot be discerned by using standard computational gene detection algorithms, which are targeted at genes that express proteins and rely heavily on locating stop codons and other protein-specific guides (12-17).Comparative methods provide a way to cut through the abundance of plausible, bu...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.