Sources of population heterogeneity may or may not be observed. If the sources of heterogeneity are observed (e.g., gender), the sample can be split into groups and the data analyzed with methods for multiple groups. If the sources of population heterogeneity are unobserved, the data can be analyzed with latent class models. Factor mixture models are a combination of latent class and common factor models and can be used to explore unobserved population heterogeneity. Observed sources of heterogeneity can be included as covariates. The different ways to incorporate covariates correspond to different conceptual interpretations. These are discussed in detail. Characteristics of factor mixture modeling are described in comparison to other methods designed for data stemming from heterogeneous populations. A step-by-step analysis of a subset of data from the Longitudinal Survey of American Youth illustrates how factor mixture models can be applied in an exploratory fashion to data collected at a single time point.
Latent variable models exist with continuous, categorical, or both types of latent variables. The role of latent variables is to account for systematic patterns in the observed responses. This article has two goals: (a) to establish whether, based on observed responses, it can be decided that an underlying latent variable is continuous or categorical, and (b) to quantify the effect of sample size and class proportions on making this distinction. Latent variable models with categorical, continuous, or both types of latent variables are fitted to simulated data generated under different types of latent variable models. If an analysis is restricted to fitting continuous latent variable models assuming a homogeneous population and data stem from a heterogeneous population, overextraction of factors may occur. Similarly, if an analysis is restricted to fitting latent class models, overextraction of classes may occur if covariation between observed variables is due to continuous factors. For the data-generating models used in this study, comparing the fit of different exploratory factor mixture models usually allows one to distinguish correctly between categorical and/or continuous latent variables. Correct model choice depends on class separation and within-class sample size.
Factor mixture models are designed for the analysis of multivariate data obtained from a population consisting of distinct latent classes. A common factor model is assumed to hold within each of the latent classes. Factor mixture modeling involves obtaining estimates of the model parameters, and may also be used to assign subjects to their most likely latent class. This simulation study investigates aspects of model performance such as parameter coverage and correct class membership assignment and focuses on covariate effects, model size, and class-specific versus class-invariant parameters. When fitting true models, parameter coverage is good for most parameters even for the smallest class separation investigated in this study (0.5 SD between 2 classes). The same holds for convergence rates. Correct class assignment is unsatisfactory for the small class separation without covariates, but improves dramatically with increasing separation, covariate effects, or both. Model performance is not influenced by the differences in model size investigated here. Class-specific parameters may improve some aspects of model performance but negatively affect other aspects.
Cannabis is the most widely produced and consumed illicit psychoactive substance worldwide. Occasional cannabis use can progress to frequent use, abuse and dependence with all known adverse physical, psychological and social consequences. Individual differences in cannabis initiation are heritable (40–48%). The International Cannabis Consortium was established with the aim to identify genetic risk variants of cannabis use. We conducted a meta-analysis of genome-wide association data of 13 cohorts (N=32 330) and four replication samples (N=5627). In addition, we performed a gene-based test of association, estimated single-nucleotide polymorphism (SNP)-based heritability and explored the genetic correlation between lifetime cannabis use and cigarette use using LD score regression. No individual SNPs reached genome-wide significance. Nonetheless, gene-based tests identified four genes significantly associated with lifetime cannabis use: NCAM1, CADM2, SCOC and KCNT2. Previous studies reported associations of NCAM1 with cigarette smoking and other substance use, and those of CADM2 with body mass index, processing speed and autism disorders, which are phenotypes previously reported to be associated with cannabis use. Furthermore, we showed that, combined across the genome, all common SNPs explained 13–20% (P<0.001) of the liability of lifetime cannabis use. Finally, there was a strong genetic correlation (rg=0.83; P=1.85 × 10−8) between lifetime cannabis use and lifetime cigarette smoking implying that the SNP effect sizes of the two traits are highly correlated. This is the largest meta-analysis of cannabis GWA studies to date, revealing important new insights into the genetic pathways of lifetime cannabis use. Future functional studies should explore the impact of the identified genes on the biological mechanisms of cannabis use.
Few genome-wide association studies (GWAS) account for environmental exposures, like smoking, potentially impacting the overall trait variance when investigating the genetic contribution to obesity-related traits. Here, we use GWAS data from 51,080 current smokers and 190,178 nonsmokers (87% European descent) to identify loci influencing BMI and central adiposity, measured as waist circumference and waist-to-hip ratio both adjusted for BMI. We identify 23 novel genetic loci, and 9 loci with convincing evidence of gene-smoking interaction (GxSMK) on obesity-related traits. We show consistent direction of effect for all identified loci and significance for 18 novel and for 5 interaction loci in an independent study sample. These loci highlight novel biological functions, including response to oxidative stress, addictive behaviour, and regulatory functions emphasizing the importance of accounting for environment in genetic analyses. Our results suggest that tobacco smoking may alter the genetic susceptibility to overall adiposity and body fat distribution.
Factor mixture models (FMM’s) are latent variable models with categorical and continuous latent variables which can be used as a model-based approach to clustering. A previous paper covered the results of a simulation study showing that in the absence of model violations, it is usually possible to choose the correct model when fitting a series of models with different numbers of classes and factors within class. The response format in the first study was limited to normally distributed outcomes. The current paper has two main goals, firstly, to replicate parts of the first study with 5-point Likert scale and binary outcomes, and secondly, to address the issue of testing class invariance of thresholds and loadings. Testing for class invariance of parameters is important in the context of measurement invariance and when using mixture models to approximate non-normal distributions. Results show that it is possible to discriminate between latent class models and factor models even if responses are categorical. Comparing models with and without class-specific parameters can lead to incorrectly accepting parameter invariance if the compared models differ substantially with respect to the number of estimated parameters. The simulation study is complemented with an illustration of a factor mixture analysis of ten binary depression items obtained from a female subsample of the Virginia Twin Registry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.