IntroductionBatch effects in large untargeted metabolomics experiments are almost unavoidable, especially when sensitive detection techniques like mass spectrometry (MS) are employed. In order to obtain peak intensities that are comparable across all batches, corrections need to be performed. Since non-detects, i.e., signals with an intensity too low to be detected with certainty, are common in metabolomics studies, the batch correction methods need to take these into account. ObjectivesThis paper aims to compare several batch correction methods, and investigates the effect of different strategies for handling non-detects.MethodsBatch correction methods usually consist of regression models, possibly also accounting for trends within batches. To fit these models quality control samples (QCs), injected at regular intervals, can be used. Also study samples can be used, provided that the injection order is properly randomized. Normalization methods, not using information on batch labels or injection order, can correct for batch effects as well. Introducing two easy-to-use quality criteria, we assess the merits of these batch correction strategies using three large LC–MS and GC–MS data sets of samples from Arabidopsis thaliana.ResultsThe three data sets have very different characteristics, leading to clearly distinct behaviour of the batch correction strategies studied. Explicit inclusion of information on batch and injection order in general leads to very good corrections; when enough QCs are available, also general normalization approaches perform well. Several approaches are shown to be able to handle non-detects—replacing them with very small numbers such as zero seems the worst of the approaches considered.ConclusionThe use of quality control samples for batch correction leads to good results when enough QCs are available. If an experiment is properly set up, batch correction using the study samples usually leads to a similar high-quality correction, but has the advantage that more metabolites are corrected. The strategy for handling non-detects is important: choosing small values like zero can lead to suboptimal batch corrections.
Many quantitative trait loci (QTL) detection methods ignore QTL-by-environment interaction (QEI) and are limited in accommodation of error and environment-specific variance. This paper outlines a mixed model approach using a recombinant inbred spring wheat population grown in six drought stress trials. Genotype estimates for yield, anthesis date and height were calculated using the best design and spatial effects model for each trial. Parsimonious factor analytic models best captured the variance-covariance structure, including genetic correlations, among environments. The 1RS.1BL rye chromosome translocation (from one parent) which decreased progeny yield by 13.8 g m -2 was explicitly included in the QTL model. Simple interval mapping (SIM) was used in a genome-wide scan for significant QTL, where QTL effects were fitted as fixed environmentspecific effects. All significant environment-specific QTL were subsequently included in a multi-QTL model and evaluated for main and QEI effects with non-significant QEI effects being dropped. QTL effects (either consistent or environment-specific) included eight yield, four anthesis, and six height QTL. One yield QTL co-located (or was linked) to an anthesis QTL, while another co-located with a height QTL. In the final multi-QTL model, only one QTL for yield (6 g m -2 ) was consistent across environments (no QEI), while the remaining QTL had significant QEI effects (average size per environment of 5.1 g m -2 ). Compared to single trial analyses, the described framework allowed explicit modelling and detection of QEI effects and incorporation of additional classification information about genotypes.
Dormancy is a state of metabolic arrest that facilitates the survival of organisms during environmental conditions incompatible with their regular course of life. Many organisms have deep dormant stages to promote an extended life span (increased longevity). In contrast, plants have seed dormancy and seed longevity described as two traits. Seed dormancy is defined as a temporary failure of a viable seed to germinate in conditions that favor germination, whereas seed longevity is defined as seed viability after dry storage (storability). In plants, the association of seed longevity with seed dormancy has not been studied in detail. This is surprising given the ecological, agronomical, and economic importance of seed longevity. We studied seed longevity to reveal its genetic regulators and its association with seed dormancy in Arabidopsis (Arabidopsis thaliana). Integrated quantitative trait locus analyses for seed longevity, in six recombinant inbred line populations, revealed five loci: Germination Ability After Storage1 (GAAS1) to GAAS5. GAAS loci colocated with seed dormancy loci, Delay Of Germination (DOG), earlier identified in the same six recombinant inbred line populations. Both GAAS loci and their colocation with DOG loci were validated by near isogenic lines. A negative correlation was observed, deep seed dormancy correlating with low seed longevity and vice versa. Detailed analysis on the collocating GAAS5 and DOG1 quantitative trait loci revealed that the DOG1-Cape Verde Islands allele both reduces seed longevity and increases seed dormancy. To our knowledge, this study is the first to report a negative correlation between seed longevity and seed dormancy.
HighlightsWe describe and demonstrate a multidimensional framework to integrate environmental and genomic predictors to enable crop improvement for a circular bioeconomy.A model training procedure based on multiple phenotypes is shown to improve predictive skill.The decision set comprised of model outputs can inform selection for both productivity and circularity metrics.Abstract. Contemporary agricultural systems are poised to transition from linear to circular, adopting concepts of recycling, repurposing, and regeneration. This transition will require changing crop improvement objectives to consider the entire system, and thus provide solutions to improve complex systems for higher productivity, resource use efficiency, and environmental quality. The methods and approaches that underpinned the doubling of yields during the last century may no longer be fully adequate to target crop improvement for circular agricultural systems. Here we propose a multidimensional framework for prediction with outcomes useful to assess both crop performance traits and environmental sustainability of the designed agricultural systems. The study focuses on maize harvestable grain yield and total carbon production, water use, and use efficiency for yield and carbon. The framework builds on the crop growth model whole genome prediction system, which is enabled by advanced phenomics and the integration of symbolic and sub-symbolic artificial intelligence. We demonstrate the approach and prediction accuracy advantages over a standard statistical genomic prediction approach used to breed maize hybrids for yield, flowering time, and kernel set using a dataset comprised of 7004 hybrids, 103 breeding populations, and 62 environments resulting from six years of experimentation in maize drought breeding in the U.S. We propose this framework to motivate a dialogue for how to enable circularity in agriculture through prediction-based systems design. Keywords: Circular bioeconomy, Circular economy, Crop improvement, Crop models, Drought, Gene editing, Genomic prediction, Maize, Plant breeding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.