Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Moreover, Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. A new open-source R package called sommer is presented to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The use of sommer for genomic prediction is demonstrated through several examples using maize and wheat genotypic and phenotypic data. At its core, the program contains three algorithms for estimating variance components: Average information (AI), Expectation-Maximization (EM) and Efficient Mixed Model Association (EMMA). Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. Results from sommer were comparable to other software, but the analysis was faster than Bayesian counterparts in the magnitude of hours to days. In addition, ability to deal with missing data, combined with greater flexibility and speed than other REML-based software was achieved by putting together some of the most efficient algorithms to fit models in a gentle environment such as R.
The development of high-throughput genotyping has made genome-wide association (GWAS) and genomic selection (GS) applications possible for both model and non-model species. The exploitation of genome-assisted approaches could greatly benefit breeding efforts in American cranberry (Vaccinium macrocarpon) and other minor crops. Using biparental populations with different degrees of relatedness, we evaluated multiple GS methods for total yield (TY) and mean fruit weight (MFW). Specifically, we compared predictive ability (PA) differences between univariate and multivariate genomic best linear unbiased predictors (GBLUP and MGBLUP, respectively). We found that MGBLUP provided higher predictive ability (PA) than GBLUP, in scenarios with medium genetic correlation (8–17% increase with corg~0.6) and high genetic correlations (25–156% with corg~0.9), but found no increase when genetic correlation was low. In addition, we found that only a few hundred single nucleotide polymorphism (SNP) markers are needed to reach a plateau in PA for both traits in the biparental populations studied (in full linkage disequilibrium). We observed that higher resemblance among individuals in the training (TP) and validation (VP) populations provided greater PA. Although multivariate GS methods are available, genetic correlations and other factors need to be carefully considered when applying these methods for genetic improvement.
Since its domestication 200 years ago, breeding of the American cranberry (Vaccinium macrocarpon) has relied on phenotypic selection because applicable resources for molecular improvement strategies such as marker-assisted selection (MAS) remain limited. To enable MAS in cranberry, the first high-density SSR linkage map with 541 markers representing all 12 cranberry chromosomes was constructed for the CNJ02-1 progeny from a cross of elite cultivars, CNJ97-105-4 and NJ98-23. The population was phenotyped for a 3-year period for total yield (TY), mean fruit weight (MFW), and biennial bearing index (BBI), and data were analyzed using mixed models and best linear unbiased predictors (BLUPs). Significant differences between genotypes were observed for all traits. Quantitative trait loci (QTL) analyses using BLUPs identified four MFW QTL on three linkage groups (LGs), three TY QTL on three LGs, and one BBI QTL which colocalized with a TY QTL. Local BLAST of a cranberry nuclear genome assembly identified homologous sequences for the mapped SSRs which were then anchored to 12 pseudo-chromosomes using the linkage map information. Analyses comparing coding regions (CDS) anchored in the cranberry linkage map with grape, kiwifruit, and tomato genomes were Electronic supplementary material The online version of this article (
BackgroundDetermination of microsatellite lengths or other DNA fragment types is an important initial component of many genetic studies such as mutation detection, linkage and quantitative trait loci (QTL) mapping, genetic diversity, pedigree analysis, and detection of heterozygosity. A handful of commercial and freely available software programs exist for fragment analysis; however, most of them are platform dependent and lack high-throughput applicability.ResultsWe present the R package Fragman to serve as a freely available and platform independent resource for automatic scoring of DNA fragment lengths diversity panels and biparental populations. The program analyzes DNA fragment lengths generated in Applied Biosystems® (ABI) either manually or automatically by providing panels or bins. The package contains additional tools for converting the allele calls to GenAlEx, JoinMap® and OneMap software formats mainly used for genetic diversity and generating linkage maps in plant and animal populations. Easy plotting functions and multiplexing friendly capabilities are some of the strengths of this R package. Fragment analysis using a unique set of cranberry (Vaccinium macrocarpon) genotypes based on microsatellite markers is used to highlight the capabilities of Fragman.ConclusionFragman is a valuable new tool for genetic analysis. The package produces equivalent results to other popular software for fragment analysis while possessing unique advantages and the possibility of automation for high-throughput experiments by exploiting the power of R.Electronic supplementary materialThe online version of this article (doi:10.1186/s12863-016-0365-6) contains supplementary material, which is available to authorized users.
BackgroundThe application of genotyping by sequencing (GBS) approaches, combined with data imputation methodologies, is narrowing the genetic knowledge gap between major and understudied, minor crops. GBS is an excellent tool to characterize the genomic structure of recently domesticated (~200 years) and understudied species, such as cranberry (Vaccinium macrocarpon Ait.), by generating large numbers of markers for genomic studies such as genetic mapping.ResultsWe identified 10842 potentially mappable single nucleotide polymorphisms (SNPs) in a cranberry pseudo-testcross population wherein 5477 SNPs and 211 short sequence repeats (SSRs) were used to construct a high density linkage map in cranberry of which a total of 4849 markers were mapped. Recombination frequency, linkage disequilibrium (LD), and segregation distortion at the genomic level in the parental and integrated linkage maps were characterized for first time in cranberry. SSR markers, used as the backbone in the map, revealed high collinearity with previously published linkage maps. The 4849 point map consisted of twelve linkage groups spanning 1112 cM, which anchored 2381 nuclear scaffolds accounting for ~13 Mb of the estimated 470 Mb cranberry genome. Bin mapping identified 592 and 672 unique bins in the parentals and a total of 1676 unique marker positions in the integrated map. Synteny analyses comparing the order of anchored cranberry scaffolds to their homologous positions in kiwifruit, grape, and coffee genomes provided initial evidence of homology between cranberry and closely related species.ConclusionsGBS data was used to rapidly saturate the cranberry genome with markers in a pseudo-testcross population. Collinearity between the present saturated genetic map and previous cranberry SSR maps suggests that the SNP locations represent accurate marker order and chromosome structure of the cranberry genome. SNPs greatly improved current marker genome coverage, which allowed for genome-wide structure investigations such as segregation distortion, recombination, linkage disequilibrium, and synteny analyses. In the future, GBS can be used to accelerate cranberry molecular breeding through QTL mapping and genome-wide association studies (GWAS).Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2802-3) contains supplementary material, which is available to authorized users.
9In the last decade the use of mixed models has become a pivotal part in the 10 implementation of genome-assisted prediction in plant and animal breeding programs. 11Exploiting the use genetic correlation among traits through multivariate predictions has 12 been proposed in recent years as a way to boost prediction accuracy and understand 13 pleiotropy and other genetic and ecological phenomena better. Multiple mixed model 14 solvers able to use relationship matrices or deal with marker-based incidence matrices 15 have been released in the last years but multivariate versions are scarse. Such solvers 16 have become quite popular in plant and animal breeding thanks to user-friendly platforms 17 such as R. Among such software one of the most recent and popular is the sommer 18 package. In this short communication we discuss the update of the package that is able to 19 run multivariate mixed models with multiple random effects and different covariance 20 structures at the level of random effects and trait-to-trait covariance along with other 21 functionalities for genetic analysis and field trial analysis to enhance the genome-assisted 22 prediction capabilities of researchers. 23 24 Introduction 25 26 Currently, linear mixed models play an important role in science to better understand 27 different biological and non-biological phenomena (Bolker 2009; Gianola and Rosa 28 2015). In brief, linear mixed models are extensions of linear models (Hastie et al., 2009). 29 Among many statistical tools used to test hypothesis and estimate parameters, linear 30 mixed models are particularly robust and flexible, which has given a pivoting role in 31 Biological sciences such as plant and animal breeding and ecology (Bolker et al. 2009; 32 Hadfield 2010; Gianola and Rosa 2015). For example, in the Genetics field, most of the 33 published genome wide association and quantitative genetics studies are based on mixed 34 models and REML estimation (Bush et al. 2012; Hirschhorn et al. 2005; Wang et al. 35 2005; Kang et al. 2008, 2010). 36 37 In general terms, current mixed model solvers use Frequentist or Bayesian approaches to 38 estimate parameters and solve the linear equations, and both approach rely on statistical 39 assumptions (i.e. distributions) on the response or covariates to be modeled (Gianola and 40 Rosa 2015; Hastie et al., 2009). Among Frequentist software and different optimization 41 techniques, mixed models are usually solved by either of the two most popular 42 REML/ML methods; mixed model equation-based (MME) algorithms, based on 43 Henderson and Searle ideas (Gilmour et al 1995; Henderson 1975; Searle 1993) and 44 direct-inversion-based (DI) algorithms which is the natural solution for the linear 45 equations in the mixed model context (Lee et al 2016; Maier 2015). Both methods require 46 iterative procedures to estimate the variance-covariance parameters and coefficients 47 number of observations and dense covariance structures are used (p > n). On the other 53 hand, Bayesian approaches implement Markov chain Mo...
The American cranberry (Vaccinium macrocarpon Ait.) is a recently domesticated, economically important, fruit crop with limited molecular resources. New genetic resources could accelerate genetic gain in cranberry through characterization of its genomic structure and by enabling molecular-assisted breeding strategies. To increase the availability of cranberry genomic resources, genotyping-by-sequencing (GBS) was used to discover and genotype thousands of single nucleotide polymorphisms (SNPs) within three interrelated cranberry full-sib populations. Additional simple sequence repeat (SSR) loci were added to the SNP datasets and used to construct bin maps for the parents of the populations, which were then merged to create the first high-density cranberry composite map containing 6073 markers (5437 SNPs and 636 SSRs) on 12 linkage groups (LGs) spanning 1124 cM. Interestingly, higher rates of recombination were observed in maternal than paternal gametes. The large number of markers in common (mean of 57.3) and the high degree of observed collinearity (mean Pair-wise Spearman rank correlations >0.99) between the LGs of the parental maps demonstrates the utility of GBS in cranberry for identifying polymorphic SNP loci that are transferable between pedigrees and populations in future trait-association studies. Furthermore, the high-density of markers anchored within the component maps allowed identification of segregation distortion regions, placement of centromeres on each of the 12 LGs, and anchoring of genomic scaffolds. Collectively, the results represent an important contribution to the current understanding of cranberry genomic structure and to the availability of molecular tools for future genetic research and breeding efforts in cranberry.
Key message Integration of multi-omics data improved prediction accuracies of oat agronomic and seed nutritional traits in multi-environment trials and distantly related populations in addition to the single-environment prediction. Abstract Multi-omics prediction has been shown to be superior to genomic prediction with genome-wide DNA-based genetic markers (G) for predicting phenotypes. However, most of the existing studies were based on historical datasets from one environment; therefore, they were unable to evaluate the efficiency of multi-omics prediction in multi-environment trials and distantly related populations. To fill those gaps, we designed a systematic experiment to collect omics data and evaluate 17 traits in two oat breeding populations planted in single and multiple environments. In the single-environment trial, transcriptomic BLUP (T), metabolomic BLUP (M), G + T, G + M, and G + T + M models showed greater prediction accuracy than GBLUP for 5, 10, 11, 17, and 17 traits, respectively, and metabolites generally performed better than transcripts when combined with SNPs. In the multi-environment trial, multi-trait models with omics data outperformed both counterpart multi-trait GBLUP models and single-environment omics models, and the highest prediction accuracy was achieved when modeling genetic covariance as an unstructured covariance model. We also demonstrated that omics data can be used to prioritize loci from one population with omics data to improve genomic prediction in a distantly related population using a two-kernel linear model that accommodated both likely casual loci with large-effect and loci that explain little or no phenotypic variance. We propose that the two-kernel linear model is superior to most genomic prediction models that assume each variant is equally likely to affect the trait and can be used to improve prediction accuracy for any trait with prior knowledge of genetic architecture.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.