Abhishek Kaul scite author profile

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data.Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.

show abstract

Household composition and the infant fecal microbiome: The INSPIRE study

Lane

McGuire

et al. 2019

American J Phys Anthropol

View full text Add to dashboard Cite

Objectives: Establishment and development of the infant gastrointestinal microbiome (GIM) varies cross-culturally and is thought to be influenced by factors such as gestational age, birth mode, diet, and antibiotic exposure. However, there is little data as to how the composition of infants' households may play a role, particularly from a cross-cultural perspective. Here, we examined relationships between infant fecal microbiome (IFM) diversity/composition and infants' household size, number of siblings, and number of other household members. Materials and methods:We analyzed 377 fecal samples from healthy, breastfeeding infants across 11 sites in eight different countries (Ethiopia, The Gambia, Ghana, Kenya, Peru, Spain, Sweden, and the United States). Fecal microbial community structure was determined by amplifying, sequencing, and classifying (to the genus level) the V1-V3 region of the bacterial 16S rRNA gene. Surveys administered to infants' mothers identified household members and composition.Results: Our results indicated that household composition (represented by the number of cohabitating siblings and other household members) did not have a measurable impact on the bacterial diversity, evenness, or richness of the IFM. However, we observed that variation in household composition categories did correspond to differential relative abundances of specific taxa, namely: Lactobacillus, Clostridium, Enterobacter, and Klebsiella.Discussion: This study, to our knowledge, is the largest cross-cultural study to date examining the association between household composition and the IFM. Our results indicate that the social environment of infants (represented here by the proxy of household composition) may influence the bacterial composition of the infant GIM, although the mechanism is unknown. A higher number and diversity of cohabitants and potential caregivers may facilitate social transmission of beneficial bacteria to the infant gastrointestinal tract, by way of shared environment or through direct

show abstract

Structural zeros in high-dimensional data with applications to microbiome studies

Kaul¹,

Davidov

Peddada

2017

Biostat

View full text Add to dashboard Cite

This paper is motivated by the recent interest in the analysis of high-dimensional microbiome data. A key feature of these data is the presence of "structural zeros" which are microbes missing from an observation vector due to an underlying biological process and not due to error in measurement. Typical notions of missingness are unable to model these structural zeros. We define a general framework which allows for structural zeros in the model and propose methods of estimating sparse high-dimensional covariance and precision matrices under this setup. We establish error bounds in the spectral and Frobenius norms for the proposed estimators and empirically verify them with a simulation study. The proposed methodology is illustrated by applying it to the global gut microbiome data of Yatsunenko and others (2012. Human gut microbiome viewed across age and geography. Nature 486, 222-227). Using our methodology we classify subjects according to the geographical location on the basis of their gut microbiome.

show abstract

Inference on the change point under a high dimensional sparse mean shift

Kaul¹,

Fotopoulos²,

Jandhyala³

et al. 2021

Electron. J. Statist.

View full text Add to dashboard Cite

We study a plug in least squares estimator for the change point parameter where change is in the mean of a high dimensional random vector under subgaussian or subexponential distributions. We obtain sufficient conditions under which this estimator possesses sufficient adaptivity against plug in estimates of mean parameters in order to yield an optimal rate of convergence Op(ξ −2 ) in the integer scale. This rate is preserved while allowing high dimensionality as well as a potentially diminishing jump size) in the subgaussian and subexponential cases, respectively. Here s, p, T and l T represent a sparsity parameter, model dimension, sampling period and the separation of the change point from its parametric boundary, respectively. Moreover, since the rate of convergence is free of s, p and logarithmic terms of T , it allows the existence of limiting distributions under high dimensional asymptotics. These distributions are then derived as the argmax of a two sided negative drift Brownian motion or a two sided negative drift random walk under vanishing and non-vanishing jump size regimes, respectively, thereby allowing inference on the change point parameter. Feasible algorithms for implementation of the proposed methodology are provided. Theoretical results are supported with monte-carlo simulations.

show abstract

Weighted ℓ1-penalized corrected quantile regression for high dimensional measurement error models

Kaul

Koul

2015

Journal of Multivariate Analysis

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Abhishek Kaul

Analysis of Microbiome Data in the Presence of Excess Zeros

Household composition and the infant fecal microbiome: The INSPIRE study

Structural zeros in high-dimensional data with applications to microbiome studies

Inference on the change point under a high dimensional sparse mean shift

Weighted ℓ1-penalized corrected quantile regression for high dimensional measurement error models

Contact Info

Product

Resources

About