We develop a Bayesian probabilistic framework for microarray data analysis. At the simplest level, we model log-expression values by independent normal distributions, parameterized by corresponding means and variances with hierarchical prior distributions. We derive point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes. An additional hyperparameter, inversely related to the number of empirical observations, determines the strength of the background variance. Simulations show that these point estimates, combined with a t -test, provide a systematic inference approach that compares favorably with simple t -test or fold methods, and partly compensate for the lack of replication.
To estimate the number and diversity of beneficial mutations, we experimentally evolved 115 populations of Escherichia coli to 42.2°C for 2000 generations and sequenced one genome from each population. We identified 1331 total mutations, affecting more than 600 different sites. Few mutations were shared among replicates, but a strong pattern of convergence emerged at the level of genes, operons, and functional complexes. Our experiment uncovered a set of primary functional targets of high temperature, but we estimate that many other beneficial mutations could contribute to similar adaptive outcomes. We inferred the pervasive presence of epistasis among beneficial mutations, which shaped adaptive trajectories into at least two distinct pathways involving mutations either in the RNA polymerase complex or the termination factor rho.
We measured sequence diversity in 21 loci distributed along chromosome 1 of maize (Zea mays ssp. mays L.). For each locus, we sequenced a common sample of 25 individuals representing 16 exotic landraces and nine U.S. inbred lines. The data indicated that maize has an average of one single nucleotide polymorphism (SNP) every 104 bp between two randomly sampled sequences, a level of diversity higher than that of either humans or Drosophila melanogaster. A comparison of genetic diversity between the landrace and inbred samples showed that inbreds retained 77% of the level of diversity of landraces, on average. In addition, Tajima's D values suggest that the frequency distribution of polymorphisms in inbreds was skewed toward fewer rare variants. Tests for selection were applied to all loci, and deviations from neutrality were detected in three loci. Sequence diversity was heterogeneous among loci, but there was no pattern of diversity along the genetic map of chromosome 1. Nonetheless, diversity was correlated (r ؍ 0.65) with sequence-based estimates of the recombination rate. Recombination in our sample was sufficient to break down linkage disequilibrium among SNPs. Intragenic linkage disequilibrium declines within 100 -200 bp on average, suggesting that genome-wide surveys for association analyses require SNPs every 100 -200 bp. Single nucleotide polymorphisms (SNPs) are valuable tools for mapping complex phenotypic traits. An SNP either can contribute directly to a phenotype or it can associate with a phenotype as a result of linkage disequilibrium (LD) (1). In either case, it is clear that successful utilization of SNPs requires detailed knowledge of patterns of genetic polymorphism throughout the genome, as well as an understanding of the evolutionary forces shaping those patterns. These forces include genomic factors, such as the distribution of recombination and mutation rates along chromosomes, and evolutionary factors, such as the history of natural selection and population demography (2).Thus far, SNPs have been surveyed extensively for evolutionary purposes in relatively few systems. The surveys have yielded four important observations about DNA sequence diversity. First, diversity varies among species; for example, Drosophila melanogaster (drosophila) is Ϸ8-to 13-fold more diverse at the DNA sequence level than humans (3). Second, the effects of natural selection and demography vary among species. Half of the loci examined in drosophila do not fit the neutral equilibrium model of evolution (4), but only 1 of 16 loci analyzed in humans deviates from the neutral model (2). Third, SNPs provide insights into population history and demography. In humans, for example, African populations contain more genetic diversity than non-African populations, and non-
Experimental evolution systems allow the genomic study of adaptation, and so far this has been done primarily in asexual systems with small genomes, such as bacteria and yeast. Here we present whole-genome resequencing data from Drosophila melanogaster populations that have experienced over 600 generations of laboratory selection for accelerated development. Flies in these selected populations develop from egg to adult ∼20% faster than flies of ancestral control populations, and have evolved a number of other correlated phenotypes. On the basis of 688,520 intermediate-frequency, high-quality single nucleotide polymorphisms, we identify several dozen genomic regions that show strong allele frequency differentiation between a pooled sample of five replicate populations selected for accelerated development and pooled controls. On the basis of resequencing data from a single replicate population with accelerated development, as well as single nucleotide polymorphism data from individual flies from each replicate population, we infer little allele frequency differentiation between replicate populations within a selection treatment. Signatures of selection are qualitatively different than what has been observed in asexual species; in our sexual populations, adaptation is not associated with 'classic' sweeps whereby newly arising, unconditionally advantageous mutations become fixed. More parsimonious explanations include 'incomplete' sweep models, in which mutations have not had enough time to fix, and 'soft' sweep models, in which selection acts on pre-existing, common genetic variants. We conclude that, at least for life history characters such as development time, unconditionally advantageous alleles rarely arise, are associated with small net fitness gains or cannot fix because selection coefficients change over time.
Genetic dissection of complex, polygenic trait variation is a key goal of medical and evolutionary genetics. Attempts to identify genetic variants underlying complex traits have been plagued by low mapping resolution in traditional linkage studies, and an inability to identify variants that cumulatively explain the bulk of standing genetic variation in genome-wide association studies (GWAS). Thus, much of the heritability remains unexplained for most complex traits. Here we describe a novel, freely available resource for the Drosophila community consisting of two sets of recombinant inbred lines (RILs), each derived from an advanced generation cross between a different set of eight highly inbred, completely resequenced founders. The Drosophila Synthetic Population Resource (DSPR) has been designed to combine the high mapping resolution offered by multiple generations of recombination, with the high statistical power afforded by a linkage-based design. Here, we detail the properties of the mapping panel of >1600 genotyped RILs, and provide an empirical demonstration of the utility of the approach by genetically dissecting alcohol dehydrogenase (ADH) enzyme activity. We confirm that a large fraction of the variation in this classic quantitative trait is due to allelic variation at the Adh locus, and additionally identify several previously unknown modest-effect trans-acting QTL (quantitative trait loci). Using a unique property of multiparental linkage mapping designs, for each QTL we highlight a relatively small set of candidate causative variants for follow-up work. The DSPR represents an important step toward the ultimate goal of a complete understanding of the genetics of complex traits in the Drosophila model system.
The Drosophila Synthetic Population Resource (DSPR) is a newly developed multifounder advanced intercross panel consisting of .1600 recombinant inbred lines (RILs) designed for the genetic dissection of complex traits. Here, we describe the inference of the underlying mosaic founder structure for the full set of RILs from a dense set of semicodominant restriction-siteassociated DNA (RAD) markers and use simulations to explore how variation in marker density and sequencing coverage affects inference. For a given sequencing effort, marker density is more important than sequence coverage per marker in terms of the amount of genetic information we can infer. We also assessed the power of the DSPR by assigning genotypes at a hidden QTL to each RIL on the basis of the inferred founder state and simulating phenotypes for different experimental designs, different genetic architectures, different sample sizes, and QTL of varying effect sizes. We found the DSPR has both high power (e.g., 84% power to detect a 5% QTL) and high mapping resolution (e.g., $1.5 cM for a 5% QTL).T HE ultimate goal of modern genetics is to determine how molecular genetic variation is translated into organismal phenotypes. The vast majority of continuously varying phenotypes are influenced by many genetic variants that often interact with one another and with environmental factors (Falconer and Mackay 1996;Roff 1997;Lynch and Walsh 1998). This underlying complexity has made identifying causative genetic variants for most traits a steep challenge for which the scientific community has only had limited, albeit increasing, success (Mackay 2001;Chanock et al. 2007; Wellcome Trust Case Control Consortium 2007;Mccarthy et al. 2008;Stranger et al. 2011). As a result, there is a large discrepancy between the known heritability of most traits and the fraction of that heritability that can be explained by known causative genetic variants (Manolio et al. 2009;Stranger et al. 2011). This discrepancy has spurred the development of new mapping panels designed to address the shortcomings of existing genome-wide association studies and QTL mapping panels derived from only two parents.The Drosophila Synthetic Population Resource (DSPR) is one such panel (King et al. 2012) similar in concept to other available linkage-based resources: the mouse Collaborative Cross (Churchill et al. 2004;Aylor et al. 2011;Philip et al. 2011), the Arabidopsis multiparent recombinant inbred line population (AMPRIL) (Huang et al. 2011), the Arabidopsis multiparent advanced generation intercross lines (MAGIC) (Kover et al. 2009), and the maize nested associated mapping population (NAM) (Yu et al. 2008;Buckler et al. 2009;Mcmullen et al. 2009;Li et al. 2011). The DSPR is a linkage-based panel that uses a synthetic population approach (Macdonald and Long 2007). To create the DSPR, two separate synthetic populations were created each from a 50-generation intercross of 8 inbred founder lines with one founder line shared between the two populations. From these two synthetic popula...
Although animals display a rich variety of shapes and patterns, the genetic changes that explain how complex forms arise are still unclear. Here we take advantage of the extensive diversity of Heliconius butterflies to identify a gene that causes adaptive variation of black wing patterns within and between species. Linkage mapping in two species groups, gene-expression analysis in seven species, and pharmacological treatments all indicate that cis-regulatory evolution of the WntA ligand underpins discrete changes in color pattern features across the Heliconius genus. These results illustrate how the direct modulation of morphogen sources can generate a wide array of unique morphologies, thus providing a link between natural genetic variation, pattern formation, and adaptation.Mü llerian mimicry | Wnt pathway | Mendelian genetics | evolutionary-developmental biology
We describe statistical methods based on the t test that can be conveniently used on high density array data to test for statistically significant differences between treatments. These t tests employ either the observed variance among replicates within treatments or a Bayesian estimate of the variance among replicates within treatments based on a prior estimate obtained from a local estimate of the standard deviation. The Bayesian prior allows statistical inference to be made from microarray data even when experiments are only replicated at nominal levels. We apply these new statistical tests to a data set that examined differential gene expression patterns in IHF 275, 29672-29684). These analyses identify a more biologically reasonable set of candidate genes than those identified using statistical tests not incorporating a Bayesian prior. We also show that statistical tests based on analysis of variance and a Bayesian prior identify genes that are up-or down-regulated following an experimental manipulation more reliably than approaches based only on a t test or fold change. All the described tests are implemented in a simple-to-use web interface called Cyber-T that is located on the University of California at Irvine genomics web site.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.