High-throughput sequencing technologies have enabled large-scale studies of the role of the human microbiome in health conditions and diseases. Microbial community level association test, as a critical step to establish the connection between overall microbiome composition and an outcome of interest, has now been routinely performed in many studies. However, current microbiome association tests all focus on a single outcome. It has become increasingly common for a microbiome study to collect multiple, possibly related, outcomes to maximize the power of discovery. As these outcomes may share common mechanisms, jointly analyzing these outcomes can amplify the association signal and improve statistical power to detect potential associations. We propose the multivariate microbiome regression-based kernel association test (MMiRKAT) for testing association between multiple continuous outcomes and overall microbiome composition, where the kernel used in MMiRKAT is based on Bray-Curtis or UniFrac distance. MMiRKAT directly regresses all outcomes on the microbiome profiles via a semiparametric kernel machine regression framework, which allows for covariate adjustment and evaluates the association via a variance-component score test. Because most of the current microbiome studies have small sample sizes, a novel small-sample correction procedure is implemented in MMiRKAT to correct for the conservativeness of the association test when the sample size is small or moderate. The proposed method is assessed via simulation studies and an application to a real data set examining the association between host gene expression and mucosal microbiome composition. We demonstrate that MMiRKAT is more powerful than large sample based multivariate kernel association test, while controlling the type I error. A free implementation of MMiRKAT in R language is available at http://research.fhcrc.org/wu/en.html.
We propose a censored quantile regression estimator motivated by unbiased estimating equations. Under the usual conditional independence assumption of the survival time and the censoring time given the covariates, we show that the proposed estimator is consistent and asymptotically normal. We develop an efficient computational algorithm which uses existing quantile regression code. As a result, bootstrap-type inference can be efficiently implemented. We illustrate the finite-sample performance of the proposed method by simulation studies and analysis of a survival data set.Comment: Published in at http://dx.doi.org/10.3150/11-BEJ388 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Panel count data often occur in long-term studies that concern occurrence rate of a recurrent event. Methods have been proposed for regression analysis of panel count data, but most of the existing research focuses on situations where observation times are independent of longitudinal response variables and therefore rely on conditional inference procedures given the observation times. This article considers a different situation where the independence assumption may not hold. That is, the observation times and the response variable may be correlated. For inference, estimating equation approaches are proposed for estimation of regression parameters and both large and finite sample properties of the proposed estimates are established. An illustrative example from a cancer study is provided.
This paper discusses regression analysis of panel count data that often arise in longitudinal studies concerning occurrence rates of certain recurrent events. Panel count data mean that each study subject is observed only at discrete time points rather than under continuous observation. Furthermore, both observation and follow-up times can vary from subject to subject and may be correlated with the recurrent events. For inference, we propose some shared frailty models and estimating equations are developed for estimation of regression parameters. The proposed estimates are consistent and have asymptotically a normal distribution. The finite sample properties of the proposed estimates are investigated through simulation and an illustrative example from a cancer study is provided.
Recurrent event data occur in many clinical and observational studies (Cook and Lawless, Analysis of recurrent event data, 2007) and in these situations, there may exist a terminal event such as death that is related to the recurrent event of interest (Ghosh and Lin, Biometrics 56:554-562, 2000; Wang et al., J Am Stat Assoc 96:1057-1065, 2001; Huang and Wang, J Am Stat Assoc 99:1153-1165, 2004; Ye et al., Biometrics 63:78-87, 2007). In addition, sometimes there may exist more than one type of recurrent events, that is, one faces multivariate recurrent event data with some dependent terminal event (Chen and Cook, Biostatistics 5:129-143, 2004). It is apparent that for the analysis of such data, one has to take into account the dependence both among different types of recurrent events and between the recurrent and terminal events. In this paper, we propose a joint modeling approach for regression analysis of the data and both finite and asymptotic properties of the resulting estimates of unknown parameters are established. The methodology is applied to a set of bivariate recurrent event data arising from a study of leukemia patients.
Event history studies occur in many fields including economics, medical studies and social science. In such studies concerning some recurrent events, two types of data have been extensively discussed in the literature. One is recurrent event data that arise if study subjects are monitored or observed continuously. In this case, the observed information provides the times of all occurrences of the recurrent events of interest. The other is panel count data, which occur if the subjects are monitored or observed only periodically. This can happen if the continuous observation is too expensive or not practical and in this case, only the numbers of occurrences of the events between subsequent observation times are available. In this paper, we discuss a third type of data, which is a mixture of recurrent event and panel count data and for which there exists little literature. For regression analysis of such data, a marginal mean model is presented and we propose an estimating equation-based approach for estimation of regression parameters. A simulation study is conducted to assess the finite sample performance of the proposed methodology and indicates that it works well for practical situations. Finally it is applied to a motivating study on childhood cancer survivors.
This paper discusses regression analysis of multivariate current status failure time data (The Statistical Analysis of Interval-censoring Failure Time Data. Springer: New York, 2006), which occur quite often in, for example, tumorigenicity experiments and epidemiologic investigations of the natural history of a disease. For the problem, several marginal approaches have been proposed that model each failure time of interest individually (Biometrics 2000; 56:940-943; Statist. Med. 2002; 21:3715-3726). In this paper, we present a full likelihood approach based on the proportional hazards frailty model. For estimation, an Expectation Maximization (EM) algorithm is developed and simulation studies suggest that the presented approach performs well for practical situations. The approach is applied to a set of bivariate current status data arising from a tumorigenicity experiment.
Variable selection is an important issue in all regression analysis and in this paper, we discuss this in the context of regression analysis of recurrent event data. Recurrent event data often occur in long-term studies in which individuals may experience the events of interest more than once and their analysis has recently attracted a great deal of attention (Andersen et al., Statistical models based on counting processes, 1993; Cook and Lawless, Biometrics 52:1311-1323, 1996, The analysis of recurrent event data, 2007; Cook et al., Biometrics 52:557-571, 1996; Lawless and Nadeau, Technometrics 37:158-168, 1995; Lin et al., J R Stat Soc B 69:711-730, 2000). However, it seems that there are no established approaches to the variable selection with respect to recurrent event data. For the problem, we adopt the idea behind the nonconcave penalized likelihood approach proposed in Fan and Li (J Am Stat Assoc 96:1348-1360, 2001) and develop a nonconcave penalized estimating function approach. The proposed approach selects variables and estimates regression coefficients simultaneously and an algorithm is presented for this process. We show that the proposed approach performs as well as the oracle procedure in that it yields the estimates as if the correct submodel was known. Simulation studies are conducted for assessing the performance of the proposed approach and suggest that it works well for practical situations. The proposed methodology is illustrated by using the data from a chronic granulomatous disease study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.