Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases1,2, advances in genotyping and sequencing technology have made genome-wide association (GWA) studies an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available because once these lines have been genotyped, they can be phenotyped multiple times, making it possible (as well as extremely cost-effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly selfing model plant, known to harbor considerable genetic variation for many adaptively important traits3. Our results are dramatically different from those of human GWA studies in that we identify many common alleles with major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true from false associations. However, a priori candidates are significantly overrepresented among these associations as well, making many of them excellent candidates for follow-up experiments by the Arabidopsis community. Our study clearly demonstrates the feasibility of GWA studies in A. thaliana, and suggests that the approach will be appropriate for many other organisms.
A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe in a global sample of 95 Arabidopsis thaliana accessions, and that established methods for controlling for population structure are generally insufficient. Here, we use the same sample together with a number of flowering-related phenotypes and data-perturbation simulations to evaluate a wider range of methods for controlling for population structure. We find that, in terms of reducing the false-positive rate while maintaining statistical power, a recently introduced mixed-model approach that takes genome-wide differences in relatedness into account via estimated pairwise kinship coefficients generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls. The importance of study design is clear; our study is severely under-powered both in terms of sample size and marker density. Our results also provide a striking demonstration of confounding by population structure. While statistical methods can be used to ameliorate this problem, they cannot always be effective and are certainly not a substitute for independent evidence, such as that obtained via crosses or transgenic experiments. Ultimately, association mapping is a powerful tool for identifying a list of candidates that is short enough to permit further genetic study.
There is currently tremendous interest in the possibility of using genome-wide association mapping to identify genes responsible for natural variation, particularly for human disease susceptibility. The model plant Arabidopsis thaliana is in many ways an ideal candidate for such studies, because it is a highly selfing hermaphrodite. As a result, the species largely exists as a collection of naturally occurring inbred lines, or accessions, which can be genotyped once and phenotyped repeatedly. Furthermore, linkage disequilibrium in such a species will be much more extensive than in a comparable outcrossing species. We tested the feasibility of genome-wide association mapping in A. thaliana by searching for associations with flowering time and pathogen resistance in a sample of 95 accessions for which genome-wide polymorphism data were available. In spite of an extremely high rate of false positives due to population structure, we were able to identify known major genes for all phenotypes tested, thus demonstrating the potential of genome-wide association mapping in A. thaliana and other species with similar patterns of variation. The rate of false positives differed strongly between traits, with more clinal traits showing the highest rate. However, the false positive rates were always substantial regardless of the trait, highlighting the necessity of an appropriate genomic control in association studies.
A potentially serious disadvantage of association mapping is the fact that marker-trait associations may arise from confounding population structure as well as from linkage to causative polymorphisms. Using genome-wide marker data, we have previously demonstrated that the problem can be severe in a global sample of 95 Arabidopsis thaliana accessions, and that established methods for controlling for population structure are generally insufficient. Here, we use the same sample together with a number of flowering-related phenotypes and data-perturbation simulations to evaluate a wider range of methods for controlling for population structure. We find that, in terms of reducing the falsepositive rate while maintaining statistical power, a recently introduced mixed-model approach that takes genomewide differences in relatedness into account via estimated pairwise kinship coefficients generally performs best. By combining the association results with results from linkage mapping in F2 crosses, we identify one previously known true positive and several promising new associations, but also demonstrate the existence of both false positives and false negatives. Our results illustrate the potential of genome-wide association scans as a tool for dissecting the genetics of natural variation, while at the same time highlighting the pitfalls. The importance of study design is clear; our study is severely under-powered both in terms of sample size and marker density. Our results also provide a striking demonstration of confounding by population structure. While statistical methods can be used to ameliorate this problem, they cannot always be effective and are certainly not a substitute for independent evidence, such as that obtained via crosses or transgenic experiments. Ultimately, association mapping is a powerful tool for identifying a list of candidates that is short enough to permit further genetic study.
Life-history traits controlling the duration and timing of developmental phases in the life cycle jointly determine fitness. Therefore, life-history traits studied in isolation provide an incomplete view on the relevance of life-cycle variation for adaptation. In this study, we examine genetic variation in traits covering the major life history events of the annual species Arabidopsis thaliana: seed dormancy, vegetative growth rate and flowering time. In a sample of 112 genotypes collected throughout the European range of the species, both seed dormancy and flowering time follow a latitudinal gradient independent of the major population structure gradient. This finding confirms previous studies reporting the adaptive evolution of these two traits. Here, however, we further analyze patterns of co-variation among traits. We observe that co-variation between primary dormancy, vegetative growth rate and flowering time also follows a latitudinal cline. At higher latitudes, vegetative growth rate is positively correlated with primary dormancy and negatively with flowering time. In the South, this trend disappears. Patterns of trait co-variation change, presumably because major environmental gradients shift with latitude. This pattern appears unrelated to population structure, suggesting that changes in the coordinated evolution of major life history traits is adaptive. Our data suggest that A. thaliana provides a good model for the evolution of trade-offs and their genetic basis.
The detection of footprints of natural selection in genetic polymorphism data is fundamental to understanding the genetic basis of adaptation, and has important implications for human health. The standard approach has been to reject neutrality in favor of selection if the pattern of variation at a candidate locus was significantly different from the predictions of the standard neutral model. The problem is that the standard neutral model assumes more than just neutrality, and it is almost always possible to explain the data using an alternative neutral model with more complex demography. Today's wealth of genomic polymorphism data, however, makes it possible to dispense with models altogether by simply comparing the pattern observed at a candidate locus to the genomic pattern, and rejecting neutrality if the pattern is extreme. Here, we utilize this approach on a truly genomic scale, comparing a candidate locus to thousands of alleles throughout the Arabidopsis thaliana genome. We demonstrate that selection has acted to increase the frequency of early-flowering alleles at the vernalization requirement locus FRIGIDA. Selection seems to have occurred during the last several thousand years, possibly in response to the spread of agriculture. We introduce a novel test statistic based on haplotype sharing that embraces the problem of population structure, and so should be widely applicable.
Unlike most of its close relatives, Arabidopsis thaliana is capable of self-pollination. In other members of the mustard family, outcrossing is ensured by the complex self-incompatibility (S) locus,which harbors multiple diverged specificity haplotypes that effectively prevent selfing. We investigated the role of the S locus in the evolution of and transition to selfing in A. thaliana. We found that the S locus of A. thaliana harbored considerable diversity, which is an apparent remnant of polymorphism in the outcrossing ancestor. Thus, the fixation of a single inactivated S-locus allele cannot have been a key step in the transition to selfing. An analysis of the genome-wide pattern of linkage disequilibrium suggests that selfing most likely evolved roughly a million years ago or more.
Pleural malignant mesothelioma (MM) is an aggressive cancer with a very long latency and a very short median survival. Little is known about the genetic events that trigger MM and their relation to poor outcome. The goal of our study was to characterize major genomic gains and losses associated with MM origin and progression and assess their clinical significance. We performed Representative Oligonucleotide Microarray Analysis (ROMA) on DNA isolated from tumors of 22 patients who recurred at variable interval with the disease after surgery. The total number of copy number alterations (CNA) and frequent imbalances for patients with short time (<12 months from surgery) and long time to recurrence were recorded and mapped using the Analysis of Copy Errors algorithm. We report a profound increase in CNA in the short-time recurrence group with most chromosomes affected, which can be explained by chromosomal instability associated with MM. Deletions in chromosomes 22q12.2, 19q13.32 and 17p13.1 appeared to be the most frequent events (55-74%) shared between MM patients followed by deletions in 1p, 9p, 9q, 4p, 3p and gains in 5p, 18q, 8q and 17q (23-55%). Deletions in 9p21.3 encompassing CDKN2A/ARF and CDKN2B were characterized as specific for the short-term recurrence group. Analysis of the minimal common areas of frequent gains and losses identified candidate genes that may be involved in different stages of MM: OSM (22q12.2), FUS1 and PL6 (3p21.3), DNAJA1 (9p21.1) and CDH2 (18q11.2-q12.3). Imbalances seen by ROMA were confirmed by Affymetrix genome analysis in a subset of samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.