Over the past 20 y, many studies have examined the history of the plant ecological and molecular model, Arabidopsis thaliana, in Europe and North America. Although these studies informed us about the recent history of the species, the early history has remained elusive. In a large-scale genomic analysis of African A. thaliana, we sequenced the genomes of 78 modern and herbarium samples from Africa and analyzed these together with over 1,000 previously sequenced Eurasian samples. In striking contrast to expectations, we find that all African individuals sampled are native to this continent, including those from sub-Saharan Africa. Moreover, we show that Africa harbors the greatest variation and represents the deepest history in the A. thaliana lineage. Our results also reveal evidence that selfing, a major defining characteristic of the species, evolved in a single geographic region, best represented today within Africa. Demographic inference supports a model in which the ancestral A. thaliana population began to split by 120-90 kya, during the last interglacial and Abbassia pluvial, and Eurasian populations subsequently separated from one another at around 40 kya. This bears striking similarities to the patterns observed for diverse species, including humans, implying a key role for climatic events during interglacial and pluvial periods in shaping the histories and current distributions of a wide range of species.he plant Arabidopsis thaliana is the principal plant model species, and as such has been useful not only to examine basic biological mechanisms but also to elucidate evolutionary processes. The exceptional resources available in this species, including seed stocks collected from throughout Eurasia for over 75 y, have been a valuable tool for learning about the natural history of A. thaliana on this continent (1, 2). Previous studies have shown that current variation in Eurasia is mainly a result of expansions and mixing from refugia in Iberia, Central Asia, and Italy/Balkans after the end of the last glacial period ∼10 kya (3-8). The main finding of the recent analysis of 1,135 sequenced genomes was that a few Eurasian samples represent divergent relict lineages, whereas the vast majority derived from the recent expansion of a single clade (4). Given the large number of studies that examine the natural history of A. thaliana, one would expect that this history would by now be described rather completely and there would be no major surprises left to uncover. However, there are still many open questions about the ancient history of the species.Several features differentiate A. thaliana from its closest relatives. Although most members of the Arabidopsis genus are obligate outcrossing perennials with large flowers and genome sizes of over 230 Mb and 8 chromosomes, A. thaliana is a predominantly selfing annual with reduced floral morphology and a reduced genome size of ∼150 Mb and 5 chromosomes. The transition to predominant selfing in A. thaliana was likely the catalyst for these derived morphological and...
Heritability is a central parameter in quantitative genetics, from both an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within-and between-genotype variability. This approach estimates broad-sense heritability and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker-based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here is to use mixed models at the individual plant or plot level. Using statistical arguments, simulations, and real data we investigate the feasibility of both approaches and how these affect genomic prediction with the best linear unbiased predictor and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at the individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For genome-wide association studies on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.KEYWORDS marker-based estimation of heritability; GWAS; genomic prediction; Arabidopsis thaliana; one-vs. two-stage approaches N ARROW-SENSE heritability is an important parameter in quantitative genetics, determining the response to selection and representing the proportion of phenotypic variance that is due to additive genetic effects (Jacquard 1983;Ritland 1996;Visscher et al. 2006Visscher et al. , 2008Holland et al. 2010;Sillanpaa 2011). This definition of heritability goes back to Fisher (1918) and Wright (1920) almost a century ago. In plant species for which replicates of the same genotype are available (inbred lines, doubled haploids, clones), a different form of heritability, broadsense heritability, is traditionally estimated by the intraclass correlation coefficient for genotypic effects, using estimates for within-and between-genotype variance. Broad-sense heritability is also referred to as repeatability and gives the proportion of phenotypic variance explained by heritable (additive) and nonheritable (dominance, epistasis) genetic variance.With the arrival of high-density genotyping there is growing interest in marker-based estimation of narrow-sense heritability (WTCCC 2007;Yang et al. 2010Yang et al. , 2011Vatti...
IntroductionBatch effects in large untargeted metabolomics experiments are almost unavoidable, especially when sensitive detection techniques like mass spectrometry (MS) are employed. In order to obtain peak intensities that are comparable across all batches, corrections need to be performed. Since non-detects, i.e., signals with an intensity too low to be detected with certainty, are common in metabolomics studies, the batch correction methods need to take these into account. ObjectivesThis paper aims to compare several batch correction methods, and investigates the effect of different strategies for handling non-detects.MethodsBatch correction methods usually consist of regression models, possibly also accounting for trends within batches. To fit these models quality control samples (QCs), injected at regular intervals, can be used. Also study samples can be used, provided that the injection order is properly randomized. Normalization methods, not using information on batch labels or injection order, can correct for batch effects as well. Introducing two easy-to-use quality criteria, we assess the merits of these batch correction strategies using three large LC–MS and GC–MS data sets of samples from Arabidopsis thaliana.ResultsThe three data sets have very different characteristics, leading to clearly distinct behaviour of the batch correction strategies studied. Explicit inclusion of information on batch and injection order in general leads to very good corrections; when enough QCs are available, also general normalization approaches perform well. Several approaches are shown to be able to handle non-detects—replacing them with very small numbers such as zero seems the worst of the approaches considered.ConclusionThe use of quality control samples for batch correction leads to good results when enough QCs are available. If an experiment is properly set up, batch correction using the study samples usually leads to a similar high-quality correction, but has the advantage that more metabolites are corrected. The strategy for handling non-detects is important: choosing small values like zero can lead to suboptimal batch corrections.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.