High-throughput sequencing technologies allow easy characterization of the human microbiome, but the statistical methods to analyze microbiome data are still in their infancy. Differential abundance methods aim at detecting associations between the abundances of bacterial species and subject grouping factors. The results of such methods are important to identify the microbiome as a prognostic or diagnostic biomarker or to demonstrate efficacy of prodrug or antibiotic drugs. Because of a lack of benchmarking studies in the microbiome field, no consensus exists on the performance of the statistical methods. We have compared a large number of popular methods through extensive parametric and nonparametric simulation as well as real data shuffling algorithms. The results are consistent over the different approaches and all point to an alarming excess of false discoveries. This raises great doubts about the reliability of discoveries in past studies and imperils reproducibility of microbiome experiments. To further improve method benchmarking, we introduce a new simulation tool that allows to generate correlated count data following any univariate count distribution; the correlation structure may be inferred from real data. Most simulation studies discard the correlation between species, but our results indicate that this correlation can negatively affect the performance of statistical methods.
The influence of chemical composition on the isothermal cocoa butter crystallization was investigated quantitatively. Apart from the fatty acid and triacylglycerol profile, the amounts of some minor components (diacylglycerols, free fatty acids, phospholipids, soap, unsaponifiable matter, iron, and primary oxidation products) were determined. With the forward model selection technique, a multiple linear regression model was established, showing the influence of chemical characteristics on the different crystallization parameters of the new model to describe the fat crystallization kinetics as developed by Foubert and others (2002). The ratios of saturated to unsaturated fatty acids and monounsaturated to diunsaturated triacylglycerols have the most important effect on the amount of crystallization, the induction time of the 2nd step of the crystallization process, and the order of the reverse reaction. The more unsaturated fatty acids and the more diunsaturated triacylglycerols, the lower the amount of crystallization; the higher the induction time for the 2nd step of crystallization, the lower the order of the reverse reaction. The amount of diacylglycerols has the most important (negative) influence on the rate constant. Other minor components with a rather pronounced influence on different crystallization parameters are the free fatty acids, phospholipids, and traces of soap. saturated triacylglycerols. Loisel and others (1998) found that during crystallization of cocoa butter in a lab-scale scraped surface heat exchanger, which leads to formation of Ј and  polymorphs, tristearoylglycerol (SSS) crystallizes 1st as a separate fraction due to its limited solubility in monounsaturated and diunsaturated triacylglycerols. Adding extra SSS consequently shortens the induction time of the 1st crystallization step but does not affect the crystallization of the remaining triacylglycerols. Hachiya and others (1989) studied the effect of seeding on the crystallization kinetics (isothermal, agitated crystallization at 30 °C) of cocoa butter and dark chocolate. They concluded that SSS in the  form does not remarkably accelerate the crystallization despite its high melting point, whereas SOS in the b form does enhance the crystallization rate.Pontillon (1998) observed that free fatty acids increase the crystallization time of cocoa butter but only at concentrations above 2%. Shukla (1995) and Ziegleder (1988) (isothermal crystallization at temperatures between 19 °C and 23 °C leading to crystallization in the Ј polymorph) noticed that cocoa butters with a higher diacylglycerol level exhibit a slower crystallization. Gutshall-Zakis and Dimick (1993) showed that slow nucleating (when crystallizing dynamically at 26.5 °C in the  polymorph) cocoa butters contain higher amounts of phospholipids. However, Savage and Dimick (1995) and Chaiseri and Dimick (1995) (isothermal crystallization at 26.5 °C under mild agitation leading to crystallization in polymorphs Ј and ) only found a correlation between the nucleation times an...
except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
BackgroundLong non-coding RNAs (lncRNAs) are typically expressed at low levels and are inherently highly variable. This is a fundamental challenge for differential expression (DE) analysis. In this study, the performance of 25 pipelines for testing DE in RNA-seq data is comprehensively evaluated, with a particular focus on lncRNAs and low-abundance mRNAs. Fifteen performance metrics are used to evaluate DE tools and normalization methods using simulations and analyses of six diverse RNA-seq datasets.ResultsGene expression data are simulated using non-parametric procedures in such a way that realistic levels of expression and variability are preserved in the simulated data. Throughout the assessment, results for mRNA and lncRNA were tracked separately. All the pipelines exhibit inferior performance for lncRNAs compared to mRNAs across all simulated scenarios and benchmark RNA-seq datasets. The substandard performance of DE tools for lncRNAs applies also to low-abundance mRNAs. No single tool uniformly outperformed the others. Variability, number of samples, and fraction of DE genes markedly influenced DE tool performance.ConclusionsOverall, linear modeling with empirical Bayes moderation (limma) and a non-parametric approach (SAMSeq) showed good control of the false discovery rate and reasonable sensitivity. Of note, for achieving a sensitivity of at least 50%, more than 80 samples are required when studying expression levels in realistic settings such as in clinical cancer research. About half of the methods showed a substantial excess of false discoveries, making these methods unreliable for DE analysis and jeopardizing reproducible science. The detailed results of our study can be consulted through a user-friendly web application, giving guidance on selection of the optimal DE tool (http://statapps.ugent.be/tools/AppDGE/).Electronic supplementary materialThe online version of this article (10.1186/s13059-018-1466-5) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.