The Zika virus (ZIKV) has captured worldwide attention with the ongoing epidemic in South America and its link to severe birth defects, most notably microcephaly. ZIKV is spread to humans through a combination of vector and sexual transmission, but the relative contribution of these transmission routes to the overall epidemic remains largely unknown. Furthermore, a disparity in the reported number of infections between males and females has been observed. We develop a mathematical model that describes the transmission dynamics of ZIKV to determine the processes driving the observed epidemic patterns. Our model reveals a 4.8% contribution of sexual transmission to the basic reproductive number, R. This contribution is too minor to independently sustain an outbreak but suggests that vector transmission is the main driver of the ongoing epidemic. We also find a minor, yet statistically significant, difference in the mean number of cases in males and females, both at the peak of the epidemic and at equilibrium. While this suggests an intrinsic disparity between males and females, the differences do not account for the vastly greater number of reported cases for females, indicative of a large reporting bias. In addition, we identify conditions under which sexual transmission may play a key role in sparking an epidemic, including temperate areas where ZIKV mosquito vectors are less prevalent.
Summary In the analysis of single-cell RNA sequencing data, researchers often characterize the variation between cells by estimating a latent variable, such as cell type or pseudotime, representing some aspect of the cell’s state. They then test each gene for association with the estimated latent variable. If the same data are used for both of these steps, then standard methods for computing p-values in the second step will fail to achieve statistical guarantees such as Type 1 error control. Furthermore, approaches such as sample splitting that can be applied to solve similar problems in other settings are not applicable in this context. In this article, we introduce count splitting, a flexible framework that allows us to carry out valid inference in this setting, for virtually any latent variable estimation technique and inference approach, under a Poisson assumption. We demonstrate the Type 1 error control and power of count splitting in a simulation study and apply count splitting to a data set of pluripotent stem cells differentiating to cardiomyocytes.
Background The gluten-free diet (GFD) involves the elimination of wheat and related grains. Wheat is a key fortification vehicle for nutrients such as iron and B vitamins. While there is growing evidence of low nutrients intake and poor diet quality amongst people following long-term GFD, few studies have used a dietary pattern approach to analyse top food sources of nutrients in today’s complex food environment. Thus, the purpose of this study was to identify food sources of energy and nutrients from previously collected diet records of adults following a GFD. Methods Three, 3-day food records were collected from 35 participants in a lifestyle intervention study (n = 240 records). All food items were categorised according to the Bureau of Nutritional Sciences Food Group Codes. Percentages of total dietary intakes from food groups were ranked. Results Mean intakes of dietary fibre, calcium and iron (females) were lower than recommended, with half the sample consuming below the recommended proportion of energy as carbohydrate. Meat, poultry and fish were the top source of energy (19.5%) in the diet. Gluten-free (GF) grain products were the top source of carbohydrate, fibre and iron and second greatest source of energy. Amongst grains, breakfast/hot cereals, yeast breads, and mixed grain dishes were the greatest nutrient contributors, despite most commercial cereals and breads (65%) being unenriched. Legumes were not frequently consumed. Conclusions GF grains were the top food source of carbohydrate, fibre and iron, despite few brands being enriched or fortified. It is a challenge to assess and monitor nutrient intakes on GFD due to the lack of nutrient composition data for B vitamins and minerals (other than iron). Dietary planning guidance for the appropriate replacement of nutrients provided by wheat is warranted.
We consider conducting inference on the output of the Classification and Regression Tree (CART) [Breiman et al., 1984] algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data.We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage.Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake.
In the analysis of single-cell RNA sequencing data, researchers often characterize the variation between cells by estimating a latent variable, such as cell type or pseudotime, representing some aspect of the individual cell's state. They then test each gene for association with the estimated latent variable. If the same data are used for both of these steps, then standard methods for computing p-values and confidence intervals in the second step will fail to achieve statistical guarantees such as Type 1 error control. Furthermore, approaches such as sample splitting that can be applied to solve similar problems in other settings are not applicable in this context. In this paper, we introduce count splitting, a flexible framework that allows us to carry out valid inference in this setting, for virtually any latent variable estimation technique and inference approach, under a Poisson assumption. We demonstrate the Type 1 error control and power of count splitting in a simulation study, and apply count splitting to a dataset of pluripotent stem 1
Material- and cell-based technologies such as engineered tissues hold great promise as human therapies. Yet, the development of many of these technologies becomes stalled at the stage of pre-clinical animal studies due to the tedious and low-throughput nature of in vivo implantation experiments. We introduce a plug-and-play in vivo screening array platform called Highly Parallel Tissue Grafting (HPTG). HPTG enables parallelized in vivo screening of 43 three dimensional microtissues within a single 3D printed device. Using HPTG, we screen microtissue formations with varying cellular and material components and identify formulations that support vascular self-assembly, integration and tissue function. Our studies highlight the importance of combinatorial studies that vary cellular and material formulation variables concomitantly, by revealing that inclusion of stromal cells can rescue vascular self-assembly in a manner that is material-dependent. HPTG provides a route for accelerating pre-clinical progress for diverse medical applications including tissue therapy, cancer biomedicine, and regenerative medicine.
Our goal is to develop a general strategy to decompose a random variable X into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, X can be thinned into independent random variables X (1) , . . . , X (K) , such that X = K k=1 X (k) . In this paper, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct X. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families.
We argue that algorithmic models, though powerful and appropriate in some circumstances, rely on just as many tenuous assumptions as parametric probabilistic models; these assumptions, their violations, and the ethical consequences of these violations are simply obscured within a black box. We advocate for a future in which statisticians play a central role in bridging the gap between Breiman's two cultures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.