The promise of association genetics to identify genes or genomic regions controlling complex traits has generated a flurry of interest. Such phenotype-genotype associations could be useful to accelerate tree breeding cycles, increase precision and selection intensity for late expressing, low heritability traits. However, the prospects of association genetics in highly heterozygous undomesticated forest trees can be severely impacted by the presence of cryptic population and pedigree structure. To investigate how to better account for this, we compared the GLM and five combinations of the Unified Mixed Model (UMM) on data of a low-density genome-wide association study for growth and wood property traits carried out in a Eucalyptus globulus population (n = 303) with 7,680 Diversity Array Technology (DArT) markers. Model comparisons were based on the degree of deviation from the uniform distribution and estimates of the mean square differences between the observed and expected p-values of all significant marker-trait associations detected. Our analysis revealed the presence of population and family structure. There was not a single best model for all traits. Striking differences in detection power and accuracy were observed among the different models especially when population structure was not accounted for. The UMM method was the best and produced superior results when compared to GLM for all traits. Following stringent correction for false discoveries, 18 marker-trait associations were detected, 16 for tree diameter growth and two for lignin monomer composition (S∶G ratio), a key wood property trait. The two DArT markers associated with S∶G ratio on chromosome 10, physically map within 1 Mbp of the ferulate 5-hydroxylase (F5H) gene, providing a putative independent validation of this marker-trait association. This study details the merit of collectively integrate population structure and relatedness in association analyses in undomesticated, highly heterozygous forest trees, and provides additional insights into the nature of complex quantitative traits in Eucalyptus.
Restriction site-associated DNA sequencing (RADseq) and its derived protocols, such as double digest RADseq (ddRADseq), offer a flexible and highly cost-effective strategy for efficient plant genome sampling. This has become one of the most popular genotyping approaches for breeding, conservation, and evolution studies in model and non-model plant species. However, universal protocols do not always adapt well to non-model species. Herein, this study reports the development of an optimized and detailed ddRADseq protocol in Eucalyptus dunnii, a non-model species, which combines different aspects of published methodologies. The initial protocol was established using only two samples by selecting the best combination of enzymes and through optimal size selection and simplifying lab procedures. Both single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) were determined with high accuracy after applying stringent bioinformatics settings and quality filters, with and without a reference genome. To scale it up to 24 samples, we added barcoded adapters. We also applied automatic size selection, and therefore obtained an optimal number of loci, the expected SNP locus density, and genome-wide distribution. Reliability and cross-sequencing platform compatibility were verified through dissimilarity coefficients of 0.05 between replicates. To our knowledge, this optimized ddRADseq protocol will allow users to go from the DNA sample to genotyping data in a highly accessible and reproducible way.
Genomic selection based on the single-step genomic best linear unbiased prediction (ssGBLUP) approach is becoming an important tool in forest tree breeding. The quality of the variance components and the predictive ability of the estimated breeding values (GEBV) depends on how well marker-based genomic relationships describe the actual genetic relationships at unobserved causal loci. We investigated the performance of GEBV obtained when fitting models with genomic covariance matrices based on two identity-by-descent (IBD) and two identity-by-state (IBS) relationship measures. Multiple-trait multiple-site ssGBLUP models were fitted to diameter and stem straightness in five open-pollinated progeny trials of Eucalyptus dunnii, genotyped using the EUChip60K. We also fitted the conventional ABLUP model with a pedigree-based covariance matrix. Estimated relationships from the IBD estimators displayed consistently lower standard deviations than those from the IBS approaches. Although ssGBLUP based in IBS estimators resulted in higher trait-site heritabilities, the gain in accuracy of the relationships using IBD estimators has resulted in higher predictive ability and lower bias of GEBV, especially for low-heritability trait-site. ssGBLUP based on IBS and IBD approaches performed considerably better than the traditional ABLUP. In summary, our results advocate the use of the ssGBLUP approach jointly with the IBD relationship matrix in open-pollinated forest tree evaluation.
Background: Functional genetic markers have important implications for genetic analysis by providing direct estimation of functional diversity. Although high throughput sequencing techniques for functional diversity analysis are being developed nowadays, the use of already well established variable markers present in candidate genes is still an interesting alternative for mapping purposes and functional diversity studies. SSR markers are routinely used in most plant and animal breeding programs for many species including Eucalyptus. SSR markers derived from candidate genes (SSR-CG) can be used effectively in co-segregation studies and marker-assisted diversity management. Results: In the present study, eight new non reported SSRs were identified in seven candidate genes for wood properties in Eucalyptus globulus: cinnamoyl CoA reductase (CCR), homocysteine Smethyltransferase (HMT), shikimate kinase (SK), xyloglucan endotransglycosylase 2 (XTH2), cellulose synthase 3 (CesA3), glutathione S-transferase (GST) and the transcription factor LIM1. Microsatellites were located in promoters, introns and exons, being most of them CT dinucleotide repeats. Genetic diversity of these eight CG-derived SSR-markers was explored in 54 unrelated genotypes. Except for XTH2, high levels of polymorphism were detected: 93 alleles (mean of 13.1 sd 1.6 alleles per locus), a mean effective number of alleles (Ne) of 5.4 (sd 1.6), polymorphic information content values (PIC) from 0.617 to 0.855 and probability of Identity (PI) ranging from 0.030 to 0.151. Conclusions: This is the first report on the identification, characterization and diversity analysis of microsatellite markers located inside wood quality candidate genes (CG) from Eucalyptus globulus. This set of markers is then appropriate for characterizing genetic variation, with potential usefulness for quantitative trait loci (QTL) mapping in different eucalypts genetic pedigrees and other applications such as fingerprinting and marker assisted diversity management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.