Genome-wide association studies have revolutionized our understanding of the genetic underpinnings of cardiometabolic disease. Yet, the inadequate representation of individuals of diverse ancestral backgrounds in these studies may undercut their ultimate potential for both public health and precision medicine. The goal of this review is to describe the imperativeness of studying the populations who are most affected by cardiometabolic disease, to the aim of better understanding the genetic underpinnings of the disease. We support this premise by describing the current variation in the global burden of cardiometabolic disease and emphasize the importance of building a globally and ancestrally representative genetics evidence base for the identification of population-specific variants, fine-mapping, and polygenic risk score estimation. We discuss the important ethical, legal, and social implications of increasing ancestral diversity in genetic studies of cardiometabolic disease and the challenges that arise from the (1) lack of diversity in current reference populations and available analytic samples and the (2) unequal generation of health-associated genomic data and their prediction accuracies. Despite these challenges, we conclude that additional, unprecedented opportunities lie ahead for public health genomics and the realization of precision medicine, provided that the gap in diversity can be systematically addressed. Achieving this goal will require concerted efforts by social, academic, professional and regulatory stakeholders and communities, and these efforts must be based on principles of equity and social justice.
Structural equation models (SEMs) are widely used to handle multiequation systems that involve latent variables, multiple indicators, and measurement error. Maximum likelihood (ML) and diagonally weighted least squares (DWLS) dominate the estimation of SEMs with continuous or categorical endogenous variables, respectively. When a model is correctly specified, ML and DWLS function well. But, in the face of incorrect structures or nonconvergence, their performance can seriously deteriorate. Model implied instrumental variable, two stage least squares (MIIV-2SLS) estimates and tests individual equations, is more robust to misspecifications, and is noniterative, thus avoiding nonconvergence. This article is an overview and tutorial on MIIV-2SLS. It reviews the six major steps in using MIIV-2SLS: (a) model specification; (b) model identification; (c) latent to observed (L2O) variable transformation; (d) finding MIIVs; (e) using 2SLS; and (f) tests of overidentified equations. Each step is illustrated using a running empirical example from Reisenzein's (1986) randomized experiment on helping behavior. We also explain and illustrate the analytic conditions under which an equation estimated with MIIV-2SLS is robust to structural misspecifications. We include additional sections on MIIV approaches using a covariance matrix and mean vector as data input, conducting multilevel SEM, analyzing categorical endogenous variables, causal inference, and extensions and applications. Online supplemental material illustrates input code for all examples and simulations using the R package MIIVsem.
Summary:
Advances in technologies to measure a broad set of exposures have led to a range of exposome research efforts. Yet, these efforts have insufficiently integrated methods that incorporate genetic data to strengthen causal inference, despite evidence that many exposome-associated phenotypes are heritable.
Objective:
We demonstrate how integration of methods and study designs that incorporate genetic data can strengthen causal inference in exposomics research by helping address six challenges: reverse causation and unmeasured confounding, comprehensive examination of phenotypic effects, low efficiency, replication, multilevel data integration, and characterization of tissue-specific effects. Examples are drawn from studies of biomarkers and health behaviors, exposure domains where the causal inference methods we describe are most often applied.
Discussion:
Technological, computational, and statistical advances in genotyping, imputation, and analysis, combined with broad data sharing and cross-study collaborations, offer multiple opportunities to strengthen causal inference in exposomics research. Full application of these opportunities will require an expanded understanding of genetic variants that predict exposome phenotypes as well as an appreciation that the utility of genetic variants for causal inference will vary by exposure and may depend on large sample sizes. However, several of these challenges can be addressed through international scientific collaborations that prioritize data sharing. Ultimately, we anticipate that efforts to better integrate methods that incorporate genetic data will extend the reach of exposomics research by helping address the challenges of comprehensively measuring the exposome and its health effects across studies, the life course, and in varied contexts and diverse populations.
https://doi.org/10.1289/EHP9098
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.