Background: Extracting relevant biological information from large data sets is a major challenge in functional genomics research. Different aspects of the data hamper their biological interpretation. For instance, 5000-fold differences in concentration for different metabolites are present in a metabolomics data set, while these differences are not proportional to the biological relevance of these metabolites. However, data analysis methods are not able to make this distinction. Data pretreatment methods can correct for aspects that hinder the biological interpretation of metabolomics data sets by emphasizing the biological information in the data set and thus improving their biological interpretability.
Classifying groups of individuals based on their metabolic profile is one of the main topics in metabolomics research. Due to the low number of individuals compared to the large number of variables, this is not an easy task. PLSDA is one of the data analysis methods used for the classification. Unfortunately this method eagerly overfits the data and rigorous validation is necessary. The validation however is far from straightforward. Is this paper we will discuss a strategy based on cross model validation and permutation testing to validate the classification models. It is also shown that too optimistic results are obtained when the validation is not done properly. Furthermore, we advocate against the use of PLSDA score plots for inference of class differences.
We describe ASCA, a new method that can deal with complex multivariate datasets containing an underlying experimental design, such as metabolomics datasets. It is a direct generalization of analysis of variance (ANOVA) for univariate data to the multivariate case. The method allows for easy interpretation of the variation induced by the different factors of the design. The method is illustrated with a dataset from a metabolomics experiment with time and dose factors.
Metabolomics data obtained from (human) nutritional intervention studies can have a rather complex structure that depends on the underlying experimental design. In this paper we discuss the complex structure in data caused by a cross-over designed experiment. In such a design, each subject in the study population acts as his or her own control and makes the data paired. For a single univariate response a paired t-test or repeated measures ANOVA can be used to test the differences between the paired observations. The same principle holds for multivariate data. In the current paper we compare a method that exploits the paired data structure in cross-over multivariate data (multilevel PLSDA) with a method that is often used by default but that ignores the paired structure (OPLSDA). The results from both methods have been evaluated in a small simulated example as well as in a genuine data set from a cross-over designed nutritional metabolomics study. It is shown that exploiting the paired data structure underlying the cross-over design considerably improves the power and the interpretability of the multivariate solution. Furthermore, the multilevel approach provides complementary information about (I) the diversity and abundance of the treatment effects within the different (subsets of) subjects across the study population, and (II) the intrinsic differences between these study subjects.
BackgroundWhile most cells in multicellular organisms carry the same genetic information, in each cell type only a subset of genes is being transcribed. Such differentiation in gene expression depends, for a large part, on the activation and repression of regulatory sequences, including transcriptional enhancers. Transcriptional enhancers can be located tens of kilobases from their target genes, but display characteristic chromatin and DNA features, allowing their identification by genome-wide profiling. Here we show that integration of chromatin characteristics can be applied to predict distal enhancer candidates in Zea mays, thereby providing a basis for a better understanding of gene regulation in this important crop plant.ResultTo predict transcriptional enhancers in the crop plant maize (Zea mays L. ssp. mays), we integrated available genome-wide DNA methylation data with newly generated maps for chromatin accessibility and histone 3 lysine 9 acetylation (H3K9ac) enrichment in young seedling and husk tissue. Approximately 1500 intergenic regions, displaying low DNA methylation, high chromatin accessibility and H3K9ac enrichment, were classified as enhancer candidates. Based on their chromatin profiles, candidate sequences can be classified into four subcategories. Tissue-specificity of enhancer candidates is defined based on the tissues in which they are identified and putative target genes are assigned based on tissue-specific expression patterns of flanking genes.ConclusionsOur method identifies three previously identified distal enhancers in maize, validating the new set of enhancer candidates and enlarging the toolbox for the functional characterization of gene regulation in the highly repetitive maize genome.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-017-1273-4) contains supplementary material, which is available to authorized users.
The plant root is the first organ to encounter salinity stress, but the effect of salinity on root system architecture (RSA) remains elusive. Both the reduction in main root (MR) elongation and the redistribution of the root mass between MRs and lateral roots (LRs) are likely to play crucial roles in water extraction efficiency and ion exclusion. To establish which RSA parameters are responsive to salt stress, we performed a detailed time course experiment in which Arabidopsis (Arabidopsis thaliana) seedlings were grown on agar plates under different salt stress conditions. We captured RSA dynamics with quadratic growth functions (ROOT-FIT) and summarized the salt-induced differences in RSA dynamics in three growth parameters: MR elongation, average LR elongation, and increase in number of LRs. In the ecotype Columbia-0 accession of Arabidopsis, salt stress affected MR elongation more severely than LR elongation and an increase in LRs, leading to a significantly altered RSA. By quantifying RSA dynamics of 31 different Arabidopsis accessions in control and mild salt stress conditions, different strategies for regulation of MR and LR meristems and root branching were revealed. Different RSA strategies partially correlated with natural variation in abscisic acid sensitivity and different Na + /K + ratios in shoots of seedlings grown under mild salt stress. Applying ROOT-FIT to describe the dynamics of RSA allowed us to uncover the natural diversity in root morphology and cluster it into four response types that otherwise would have been overlooked.
A new strategy for the simultaneous modeling of molecular weight distribution (MWD) and degree of branching distribution (DBD) for such branched polymers as bimodal low-density polyethylene is presented, based on the Galerkin h-p finite element package PREDICI, a commercial code.The key problem of how to address a bidimensional distribution is successfully solved by using so-called reduced or pseudo distributions. The branching distribution per chain length is modeled by moment equations, thus yielding distributions of branching moments over chain length. No closure relationships are required. The MWD/DBD curves obtained are the most probable ones for the given reaction mechanisms and kinetic data. Simulated MWD and DBD curves are compared to experimental data from gel permeation chromatography and light scattering; the agreement found is good in general and excellent in one case. The bimodal MWD of the autoclave low-density polyethylene (ldPE) IUPAC Alpha could be reproduced well. It is finally shown that the shapes of MWD and DBD are highly sensitive, quantitative measures for random scission.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.