In observational studies, type-2 diabetes (T2D) is associated with an increased risk of coronary heart disease (CHD), yet interventional trials have shown no clear effect of glucose-lowering on CHD. Confounding may have therefore influenced these observational estimates. Here we use Mendelian randomization to obtain unconfounded estimates of the influence of T2D and fasting glucose (FG) on CHD risk. Using multiple genetic variants associated with T2D and FG, we find that risk of T2D increases CHD risk (odds ratio (OR)=1.11 (1.05–1.17), per unit increase in odds of T2D, P=8.8 × 10−5; using data from 34,840/114,981 T2D cases/controls and 63,746/130,681 CHD cases/controls). FG in non-diabetic individuals tends to increase CHD risk (OR=1.15 (1.00–1.32), per mmol·per l, P=0.05; 133,010 non-diabetic individuals and 63,746/130,681 CHD cases/controls). These findings provide evidence supporting a causal relationship between T2D and CHD and suggest that long-term trials may be required to discern the effects of T2D therapies on CHD risk.
The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.
In this paper, we consider some potential pitfalls of the growing use of quasi-likelihood-based information criteria for longitudinal data to select a working correlation structure in a generalized estimating equation framework. In particular, we examine settings where the fully conditional mean does not equal the marginal mean as well as hypothesis testing following selection of the working correlation matrix. Our results suggest that the use of any information criterion for selection of the working correlation matrix is inappropriate when the conditional mean model assumption is violated. We also find that type I error differs from the nominal level in moderate sample sizes following selection of the form of the working correlation but improves as sample size is increased as the selection is then concentrated on a single correlation structure. Our results serve to underline the potential dangers that can arise when using information criteria to select correlation structure in routine data analysis.
The genomics era has led to an increase in the dimensionality of the data collected to investigate biological questions. In this context, dimension-reduction techniques can be used to summarize high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as Principal Component of Heritability and renamed here as Principal Component of Explained Variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power but limited by its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach will be illustrated using three examples taken from the epigenetics and brain imaging areas.
In epidemiological studies of time-to-event data, a quantity of interest to the clinician and the patient is the risk of an event given a covariate profile. However, methods relying on time matching or risk-set sampling (including Cox regression) eliminate the baseline hazard from the likelihood expression or the estimating function. The baseline hazard then needs to be estimated separately using a non-parametric approach. This leads to step-wise estimates of the cumulative incidence that are difficult to interpret. Using casebase sampling, Hanley & Miettinen (2009) explained how the parametric hazard functions can be estimated using logistic regression. Their approach naturally leads to estimates of the cumulative incidence that are smooth-in-time.In this paper, we present the casebase R package, a comprehensive and flexible toolkit for parametric survival analysis. We describe how the case-base framework can also be used in more complex settings: competing risks, time-varying exposure, and variable selection. Our package also includes an extensive array of visualization tools to complement the analysis of time-to-event data. We illustrate all these features through four different case studies.*SRB and MT contributed equally to this work.
Computations were made on the supercomputer Mammouth-parallèle 2 from Université de Sherbrooke, managed by Calcul Québec and Compute Canada. The operation of this supercomputer is funded by the Canada Foundation for Innovation (CFI), the ministère de l' Économie, de la science et de l'innovation du Québec (MESI) and the Fonds de recherche du Québec-Nature et technologies (FRQ-NT). The methylation data used for the data analysis was kindly provided by Marie Hudson, Sasha Bernatsky, Ines Colmegna, and Tomi Pastinen. Finally, the authors would also want to thank Stepan Grinek for bringing to our attention the machine learning and pattern recognition literature on the use of truncated SVD.
In clinical studies of time-to-event data, a quantity of interest to the clinician is their patient's risk of an event. However, methods relying on time matching or risk-set sampling (including Cox regression) eliminate the baseline hazard from the estimating function. As a consequence, the focus has been on reporting hazard ratios instead of survival or cumulative incidence curves. Indeed, reporting patient risk or cumulative incidence requires a separate estimation of the baseline hazard. Using case-base sampling, Hanley & Miettinen (2009) explained how parametric hazard functions can be estimated in continuous-time using logistic regression. Their approach naturally leads to estimates of the survival or risk function that are smooth-in-time. In this paper, we present the casebase R package, a comprehensive and flexible toolkit for parametric survival analysis. We describe how the case-base framework can also be used in more complex settings: non-linear functions of time and non-proportional hazards, competing risks, and variable selection. Our package also includes an extensive array of visualization tools to complement the analysis. We illustrate all these features through three different case studies. * SRB and MT contributed equally to this work.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.