The phenotypes of plants develop over time and change in response to the environment. New engineering and computer vision technologies track these phenotypic changes. Identifying the genetic loci regulating differences in the pattern of phenotypic change remains challenging. This study used functional principal component analysis (FPCA) to achieve this aim. Time series phenotype data were collected from a sorghum (Sorghum bicolor) diversity panel using a number of technologies including conventional color photography and hyperspectral imaging. This imaging lasted for 37 d and centered on reproductive transition. A new higher density marker set was generated for the same population. Several genes known to control trait variation in sorghum have been previously cloned and characterized. These genes were not confidently identified in genomewide association analyses at single time points. However, FPCA successfully identified the same known and characterized genes. FPCA analyses partitioned the role these genes play in controlling phenotypes. Partitioning was consistent with the known molecular function of the individual cloned genes. These data demonstrate that FPCA-based genome-wide association studies can enable robust time series mapping analyses in a wide range of contexts. Moreover, time series analysis can increase the accuracy and power of quantitative genetic analyses.
High-throughput phenotyping systems provide abundant data for statistical analysis through plant imaging. Before usable data can be obtained, image processing must take place. In this study, we used supervised learning methods to segment plants from the background in such images and compared them with commonly used thresholding methods. Because obtaining accurate training data is a major obstacle to using supervised learning methods for segmentation, a novel approach to producing accurate labels was developed. We demonstrated that, with careful selection of training data through such an approach, supervised learning methods, and neural networks in particular, can outperform thresholding methods at segmentation. High throughput plant phenotyping is a broad umbrella. The field includes researchers working in the fields of plant biology, engineering, computer science, and statistics. A common goal to make the collection of plant trait data as efficient and scalable as the collection of plant genetic data unites the field
Maize (Zea mays ssp. mays) is one of three crops, along with rice and wheat, responsible for more than 1/2 of all calories consumed around the world. Increasing the yield and stress tolerance of these crops is essential to meet the growing need for food. The cost and speed of plant phenotyping is currently the largest constraint on plant breeding efforts. Datasets linking new types of high throughput phenotyping data collected from plants to the performance of the same genotypes under agronomic conditions across a wide range of environments are essential for developing new statistical approaches and computer vision based tools. A set of maize inbreds -primarily recently off patent lines -were phenotyped using a high throughput platform at University of Nebraska-Lincoln. These lines have been previously subjected to high density genotyping, and scored for a core set of 13 phenotypes in field trials across 13 North American states in two years by the Genomes to Fields consortium. A total of 485 GB of image data including RGB, hyperspectral, fluorescence and thermal infrared photos has been released. Correlations between image-based measurements and manual measurements demonstrated the feasibility of quantifying variation in plant architecture using image data. However, naive approaches to measuring traits such as biomass can introduce nonrandom measurement errors confounded with genotype variation. Analysis of hyperspectral image data demonstrated unique signatures from stem tissue. Integrating heritable phenotypes from high-throughput phenotyping data with field data from different environments can reveal previously unknown factors influencing yield plasticity.
BackgroundMaize (Zea mays ssp. mays) is 1 of 3 crops, along with rice and wheat, responsible for more than one-half of all calories consumed around the world. Increasing the yield and stress tolerance of these crops is essential to meet the growing need for food. The cost and speed of plant phenotyping are currently the largest constraints on plant breeding efforts. Datasets linking new types of high-throughput phenotyping data collected from plants to the performance of the same genotypes under agronomic conditions across a wide range of environments are essential for developing new statistical approaches and computer vision–based tools.FindingsA set of maize inbreds—primarily recently off patent lines—were phenotyped using a high-throughput platform at University of Nebraska-Lincoln. These lines have been previously subjected to high-density genotyping and scored for a core set of 13 phenotypes in field trials across 13 North American states in 2 years by the Genomes 2 Fields Consortium. A total of 485 GB of image data including RGB, hyperspectral, fluorescence, and thermal infrared photos has been released.ConclusionsCorrelations between image-based measurements and manual measurements demonstrated the feasibility of quantifying variation in plant architecture using image data. However, naive approaches to measuring traits such as biomass can introduce nonrandom measurement errors confounded with genotype variation. Analysis of hyperspectral image data demonstrated unique signatures from stem tissue. Integrating heritable phenotypes from high-throughput phenotyping data with field data from different environments can reveal previously unknown factors that influence yield plasticity.
Recent advances in automated plant phenotyping have enabled the collection time series measurements from the same plants of a wide range of traits over different developmental time scales. The availability of time series phenotypic datasets has increased interest in statistical approaches for comparing patterns of change between different plant genotypes and different treatment conditions. Two widely used methods of modeling growth over time are point-wise analysis of variance (ANOVA) and parametric sigmoidal curve fitting. Point-wise ANOVA yields discontinuous growth curves, which do not reflect the true dynamics of growth patterns in plants. In contrast, fitting a parametric model to a time series of observations does capture the trend of growth, however these models require assumptions regarding the true pattern of plant growth. Depending on the species, treatment regime, and subset of the plant lifecycle sampled this assumptions will not always hold true. Here we introduce a different approach -functional ANOVA -which yields continuous growth curves without requiring assumptions regarding patterns of plant growth. We compare and validate this approach using data from an experiment measuring growth of two maize (Zea mays ssp. mays) genotypes under two water availability treatments over a 21-day period. Functional ANOVA enables a nonparametric estimation of the dynamics of changes in plant traits over time without assumptions regarding curve shape. In addition to estimating smooth curves of trait values over time, functional ANOVA also estimates the the derivatives of these curves -e.g. growth rates -simultaneously. Using two different subsampling strategies, we demonstrate that this functional ANOVA method enables the comparison of growth curves between plants phenotyped on non-overlapping days with little reduction in estimation accuracy. This means functional ANOVA based approaches can allow larger numbers of samples and biological replicates to be scored in a single experiment given fixed amounts of phenotyping infrastructure and personnel.
In a plant science Root Image Study, the process of seedling roots bending in response to gravity is recorded using digital cameras, and the bending rates are modeled as functional plant phenotype data. The functional phenotypes are collected from seeds representing a large variety of genotypes and have a three-level nested hierarchical structure, with seeds nested in groups nested in genotypes. The seeds are imaged on different days of the lunar cycle, and an important scientific question is whether there are lunar effects on root bending. We allow the mean function of the bending rate to depend on the lunar day and model the phenotypic variation between genotypes, groups of seeds imaged together, and individual seeds by hierarchical functional random effects. We estimate the covariance functions of the functional random effects by a fast penalized tensor product spline approach, perform multi-level functional principal component analysis (FPCA) using the best linear unbiased predictor of the principal component scores, and improve the efficiency of mean estimation by iterative decorrelation. We choose the number of principal components using a conditional Akaike information criterion and test the lunar day effect using generalized likelihood ratio test statistics based on the marginal and conditional likelihoods. We also propose a permutation procedure to evaluate the null distribution of the test statistics. Our simulation studies show that our model selection criterion selects the correct number of principal components with remarkably high frequency, and the likelihood-based tests based on FPCA have higher power than a test based on working independence. Supplementary materials for this article are available online.
Analytical techniques such as NMR and mass spectrometry can generate large metabolomics data sets containing thousands of spectral features derived from numerous biological observations. Multivariate data analysis is routinely used to uncover the underlying biological information contained within these large metabolomics data sets. This is typically accomplished by classifying the observations into groups (e.g., control versus treated) and by identifying associated discriminating features. There are a variety of classification models to select from, which include some well-established techniques (e.g., principal component analysis [PCA], orthogonal projection to latent structure [OPLS], or partial least-squares projection to latent structures [PLS]) and newly emerging machine learning algorithms (e.g., support vector machines or random forests). However, it is unclear which classification model, if any, is an optimal choice for the analysis of metabolomics data. Herein, we present a comprehensive evaluation of five common classification models routinely employed in the metabolomics field and that are also currently available in our MVAPACK metabolomics software package. Simulated and experimental NMR data sets with various levels of group separation were used to evaluate each model. Model performance was assessed by classification accuracy rate, by the area under a receiver operating characteristic (AUROC) curve, and by the identification of true discriminating features. Our findings suggest that the five classification models perform equally well with robust data sets. Only when the models are stressed with subtle data set differences does OPLS emerge as the best-performing model. OPLS maintained a high-prediction accuracy rate and a large area under the ROC curve while yielding loadings closest to the true loadings with limited group separations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.