Abbreviations: AMMI, additive main effects and multiplicative interaction; ASV, additive main effects and multiplicative interaction stability value; AVRC, index and the ranks of the mean yields; BLUP, best linear unbiased prediction; EV, averages of the squared eigenvector values; GEI, genotype × environment interaction; HMGV, harmonic mean of genotypic values; HMRPGV, harmonic mean of relative performance of genotypic values; IPCA, interaction principal component axis; LMM, linear mixed-effect model; MET, multi-environment trials; NF, no fungicide; RCBD, randomized complete block design; RMSPD, root mean square prediction difference; SPIC, sums of the absolute value of the IPCA scores; SVD, singular value decomposition; WAASB, weighted average of absolute scores from the singular value decomposition of the matrix of best linear unbiased predictions for the genotype × environment interaction effects generated by an linear mixedeffect model; WAASBY, weighted average of weighted average of absolute scores from the singular value decomposition of the matrix of best linear unbiased predictions for the genotype × environment interaction effects generated by an linear mixed-effect model and response variable; WF, with fungicide; Za, absolute value of the relative contribution of interaction principal component axes to the interaction. bIoMetrY, ModeLInG, And stAtIstIcsPublished in Agron.
1. Multi-environment trials (MET) are crucial steps in plant breeding programs that aim at increasing crop productivity to ensure global food security. The analysis of MET data requires the combination of several approaches including data manipulation, visualization and modelling. As new methods are proposed, analysing MET data correctly and completely remains a challenge, often intractable with existing tools. 2. Here we describe the metan R package, a collection of functions that implement a workflow-based approach to (a) check, manipulate and summarize typical MET data; (b) analyse individual environments using both fixed and mixed-effect models; (c) compute parametric and nonparametric stability statistics; (d) implement biometrical models widely used in MET analysis and (e) plot typical MET data quickly.3. In this paper, we present a summary of the functions implemented in metan and how they integrate into a workflow to explore and analyse MET data. We guide the user along a gentle learning curve and show how adding only a few commands or options at a time, powerful analyses can be implemented. 4. metan offers a flexible, intuitive and richly documented working environment with tools that will facilitate the implementation of a complete analysis of MET datasets. K E Y W O R D Sadditive main effect and multiplicative interaction, biometry, genotype-environment interaction, GGE biplot, multi-environment trials, R software, stability, statistics
Abbreviations: AMMI, additive main effects and multiplicative interaction; ASV, additive main effects and multiplicative interaction stability value; AUDPC, area under the disease progress curve; BLUP, best linear unbiased prediction; CW, caryopses weight; GEI, genotype × environment interaction; GSI, genotype stability index; GW, grain weight; GWP, grain weight per panicle; GY, grain yield; HI, hulling index; HW, hectoliter weight; IGY, industrial grain yield; IPCA, interaction principal component axis; LMM, linear mixed-effect model; MET, multi-environment trial; MPE, mean performance and stability; MTSI, multi-trait stability index; NEP, number of spikelets per panicle; NG2, number of grains >2 mm; NGP, number of grains per panicle; PL, panicle length; PM, panicle mass; TGW, thousand-grain weight; WAASB, weighted average of absolute scores from the singular value decomposition of the matrix of best linear unbiased predictions for the genotype × environment interaction effects generated by an linear mixed-effect model; WAASBY, weighted average of WAASB and response variable. bIoMetrY, ModeLInG, And stAtIstIcsPublished in Agron.
Motivation Multivariate data are common in biological experiments and using the information on multiple traits is crucial to make better decisions for treatment recommendations or genotype selection. However, identifying genotypes/treatments that combine high performance across many traits has been a challenger task. Classical linear multi-trait selection indexes are available, but the presence of multicollinearity and the arbitrary choosing of weighting coefficients may erode the genetic gains. Results We propose a novel approach for genotype selection and treatment recommendation based on multiple traits that overcome the fragility of classical linear indexes. Here, we use the distance between the genotypes/treatment with an ideotype defined a priori as a multi-trait genotype-ideotype distance index (MGIDI) to provide a selection process that is unique, easy-to-interpret, free from weighting coefficients and multicollinearity issues. The performance of the MGIDI index is assessed through a Monte Carlo simulation study where the percentage of success in selecting traits with desired gains is compared with classical and modern indexes under different scenarios. Two real plant datasets are used to illustrate the application of the index from breeders and agronomists’ points of view. Our experimental results indicate that MGIDI can effectively select superior treatments/genotypes based on multi-trait data, outperforming state-of-the-art methods, and helping practitioners to make better strategic decisions towards an effective multivariate selection in biological experiments. Availability and implementation The source code is available in the R package metan (https://github.com/TiagoOlivoto/metan) under the function mgidi(). Supplementary information Supplementary data are available at Bioinformatics online.
We proposed a workflow for nonlinear modeling of data from multiple‐harvest crops. We demonstrated why the nonlinearity measures should be used to select nonlinear models. We demonstrated as the critical points describe the multiple‐harvest crops production. Logistic model parameters determine the precocity and the concentration of production. Growth models are alternative to ANOVA in analyzing data from multiple‐harvest crops. Nonlinear growth models have been widely used for analyzing production curves with a sigmoidal pattern; however, all benefits that these models provide are not being fully exploited. Our aim here is to provide a step‐by‐step guide on how to choose a nonlinear model with parameters close to being unbiased, and to show how to estimate and interpret the critical points of a model aimed at determining the precocity and concentration of the production. Data on two uniformity trials conducted with eggplant (Solanum melongena L.) was used for this purpose. The Brody, Gompertz, logistic, and von Bertalanffy models were fitted to predict the number and fresh mass of fruits per plant. The model with lower nonlinearity measures and lower bias of the parameter estimates was selected. All the tested models presented satisfactory goodness‐of‐fit measures, but they differed regarding nonlinearity measures. The logistic model was selected because it had lower intrinsic and parametric nonlinearity and lower bias in parameter estimates. The inflection point and maximum acceleration/deceleration points of this model provide detailed pieces of information of the production through the productive cycle. Finally, using the logistic model as an example, we demonstrate that lower values of β2 are related to an earlier maximum production rate, and higher values of β3 are related to an earlier production that is concentrated in fewer days. The nonlinearity measures were important for the model selection. Thus, it is strongly recommended that nonlinearity is estimated and used to select nonlinear models in future studies.
The multicollinearity in path analysis was investigated in different scenarios. A biometrical approach identified the multicollinearity‐generating traits. Data derived from averages overestimated the correlation coefficients. The use of all sampled observations increased the accuracy in path analysis. A simple sample tracking method that reduces multicollinearity is proposed. Some data arrangement methods often used may mask correlation coefficients among explanatory traits, increasing multicollinearity in multiple regression analysis. This study was performed to determine if the harmful effects of multicollinearity might be reduced in the estimation of the X′X correlation matrix among explanatory traits. For this, data on 45 treatments (15 maize [Zea mays L.] hybrids sown in three places) were used. Three path analysis methods (traditional, with k inclusion, and traditional with trait exclusion) were tested in two scenarios: with X′X matrix estimated with all sampled observations (ASO, n = 900) and with the X′X matrix estimated with the average values of each plot (AVP, n = 180). The condition number (CN) was reduced from 3395 to 2004 when the matrix was estimated with all observations. On average, the factors that inflate the variance of regression coefficients were increased by 61% in the AVP scenario. The addition of the k coefficient reduced the CN to 85.40 and 51.17 for the ASO and AVP scenarios, respectively. Exclusion of multicollinearity‐generating traits was more effective in the ASO than the AVP scenario, resulting in CNs of 29.62 and 63.66, respectively. The largest coefficient of determination (0.977) and the smallest noise (0.150) were obtained in the ASO scenario after the exclusion of the multicollinearity‐generating traits. The use of all sampled observations does not mask the individual variances and reduces the magnitude of the correlations among explanatory traits in 90% of cases, improving the accuracy of biological studies involving path analysis.
Guar, the most popular vegetable, is tolerant of drought and is a valuable industrial crop enormously grown across India, Pakistan, USA, and South Africa for pharmaceutically and cosmetically usable galactomannan (gum) content present in seed endosperm. Guar genotypes with productive traits which could perform better in differential environmental conditions are of utmost priority for genotype selection. This could be achieved by employing multivariate trait analysis. In this context, Multi-Trait Stability Index (MTSI) and Multi-Trait Genotype-Ideotype Distance Index (MGIDI) were employed for identifying high-performing genotypes exhibiting multiple traits. In the current investigation, 85 guar accessions growing in different seasons were assessed for 15 morphological traits. The results obtained by MTSI and MGIDI indexes revealed that, out of 85, only 13 genotypes performed better across and within the seasons, and, based on the coincidence index, only three genotypes (IC-415106, IC-420320, and IC-402301) were found stable with high seed production in multi-environmental conditions. View on strengths and weakness as described by the MGIDI reveals that breeders concentrated on developing genotype with desired traits, such as quality of the gum and seed yield. The strength of the ideal genotypes in the present work is mainly focused on high gum content, short crop cycle, and high seed yield possessing good biochemical traits. Thus, MTSI and MGIDI serve as a novel tool for desired genotype selection process simultaneously in plant breeding programs across multi-environments due to uniqueness and ease in interpreting data with minimal multicollinearity issues.
Experiments measuring the interaction between genotypes and environments measure the spatial (e.g., locations) and temporal (e.g., years) separation and/or combination of these factors. The genotype-by-environment interaction (GEI) is very important in plant breeding programs. Over the past six decades, the propensity to model the GEI led to the development of several models and mathematical methods for deciphering GEI in multi-environmental trials (METs) called “stability analyses”. However, its size is hidden by the contribution of improved management in the yield increase, and for this reason comparisons of new with old varieties in a single experiment could reveal its real size. Due to the existence of inherent differences among proposed methods and analytical models, it is necessary for researchers that calculate stability indices, and ultimately select the superior genotypes, to dissect their usefulness. Thus, we have collected statistics, as well as models and their equations, to explore these methods further. This review introduces a complete set of parametric and non-parametric methods and models with a selection pattern based on each of them. Furthermore, we have aligned each method or statistic with a matched software, macro codes, and/or scripts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.