Model-Based Clustering of High-Dimensional Longitudinal Data via Regularization

Yang, Luoying; Wu, Tong Tong

doi:10.1111/biom.13672

Cited by 7 publications

(8 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is obvious that these other approaches fail to consider the relationship between the covariates and the outcome of interest during the clustering process as they only make such a connection after the clustering has been performed. In addition, limited work has been done on the clustering of longitudinal high-dimensional data [ 19 ]. The longitudinal latent class analysis may be used to consider the correlation inherent in time-dependent, repeated-measure observations, with the limitation that all time points must be identical across subjects.…”

Section: Discussionmentioning

confidence: 99%

“…In this paper, we consider the clustering of correlated observations with categorical outcomes and high-dimensional microbiome data. The method used here is a novel, non-trivial extension of the one detailed in Yang and Wu [ 19 ], which is a mixture-model-based clustering method for longitudinal data with regularization to enforce a variable selection of high-dimensional covariates. We extend this by considering categorical outcomes as opposed to Gaussian outcomes, which pose unique challenges of their own, as the Yang and Wu method only considers Gaussian outcomes.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Novel Clustering Methods Identified Three Caries Status-Related Clusters Based on Oral Microbiome in Thai Mother–Child Dyads

Manning

Xiao

et al. 2023

Genes

Self Cite

View full text Add to dashboard Cite

Early childhood caries (ECC) is a disease that globally affects pre-school children. It is important to identify both protective and risk factors associated with this disease. This paper examined a set of saliva samples of Thai mother–child dyads and aimed to analyze how the maternal factors and oral microbiome of the dyads influence the development of ECC. However, heterogeneous latent subpopulations may exist that have different characteristics in terms of caries development. Therefore, we introduce a novel method to cluster the correlated outcomes of dependent observations while selecting influential independent variables to unearth latent groupings within this dataset and reveal their association in each group. This paper describes the discovery of three heterogeneous clusters in the dataset, each with its own unique mother–child outcome trend, as well as identifying several microbial factors that contribute to ECC. Significantly, the three identified clusters represent three typical clinical conditions in which mother–child dyads have typical (cluster 1), high–low (cluster 2), and low–high caries experiences (cluster 3) compared to the overall trend of mother–child caries status. Intriguingly, the variables identified as the driving attributes of each cluster, including specific taxa, have the potential to be used in the future as caries preventive measures.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Novel Clustering Methods Identified Three Caries Status-Related Clusters Based on Oral Microbiome in Thai Mother–Child Dyads

Manning

Xiao

et al. 2023

Genes

Self Cite

View full text Add to dashboard Cite

show abstract

“…A novel one-step procedure [ 33 ] was used to perform clustering and variable selection simultaneously via a mixture of linear mixed-effects models with shrinkage penalties on both fixed effects and random effects.…”

Section: Discussionmentioning

confidence: 99%

“…As one can see from the discussion above, post-selection inference in clustering methods is even more challenging than the inference problem for single models because one should consider not only the within-cluster but also between-cluster significance. In the paper of Yang and Wu [ 33 ], the authors proved that joint convergence rate of the fixed and random effects when both dimensions grow at an exponential rate of sample size within clusters (i.e., for a homogeneous population). Under the same setting, they also proved the sparsistency property.…”

Section: Discussionmentioning

confidence: 99%

Clustering of longitudinal physical activity trajectories among young females with selection of associated factors

Yang

Young

2022

PLoS ONE

Self Cite

View full text Add to dashboard Cite

We examined multi-level factors related to the longitudinal physical activity trajectories of adolescent girls to determine the important predictors for physical activity. The Trial of Activity in Adolescent Girls (TAAG) Maryland site recruited participants at age 14 (n = 566) and followed up with these girls at age 17 (n = 553) and age 23 (n = 442). Individual, social factors and perceived environmental factors were assessed by questionnaire; body mass index was measured at age 14 and age 17, and self-reported at age 23. Neighborhood factors were assessed by geographic information systems. The outcome, moderate-to-vigorous physical activity (MVPA) minutes in a day, was assessed from accelerometers. A mixture of linear mixed-effects models with double penalization on fixed effects and random effects was used to identify the intrinsic grouping of participants with similar physical activity trajectory patterns and the most relevant predictors within the groups simultaneously. Three clusters of participants were identified. Two hundred and forty participants were clustered as “maintainers” and had consistently low MVPA over time; 289 participants were clustered as “decreasers” who had decreasing MVPA over time; 39 participants were grouped as “increasers” and had increasing MVPA over time. Each of the three clusters has its own cluster-specific factors identified using the clustering method, indicating that each cluster has unique characteristics.

show abstract

“…A simultaneous penalised linear mixed model (SP-LMM), as implemented in the splmm R package, was fitted for each dataset. This involves a feature selection in a high-dimensional longitudinal setting [14]. A few metabolites and proteins with the largest absolute effect size were considered to be potentially associated with the CVD onset.…”

Section: Methodsmentioning

confidence: 99%

Individual reference intervals in practice: A guide to personalise clinical and omics level data with IRIS

Pusparum

Thas

Ertaylan

2022

Preprint

View full text Add to dashboard Cite

Reference intervals (RI) are the best-established methodology used for the interpretation of numerical clinical level data in healthcare and clinical practice. As the test results are interpreted by comparing with the (population-derived) reference intervals, the quality of the calculation and implementation of reference intervals play a major role in decision-making process at the subject level. Here we describe the IRIS workflow to compute Individual Reference Intervals (IRI) based on multiple “healthy” data points from the same subjects and also utilising peers’ test results. We have improved the IRI models so they allow for covariate adjustments, such as sex and age. The IRI is expected to play pivotal roles in i) early detection of disease transition in chronic diseases by facilitating the detection of small deviations in clinical measurements, ii) monitoring personal disease progression, either using the standard clinical biochemistry test results or the omics level data. We demonstrate the utility of IRI in clinical and omics level data (proteomics and metabolomics) from two different longitudinal studies, including prior data processing and data quality check procedures. We have created an integrated application IRIS incorporating all described steps in an easy-to-use tool in research and/or clinical practice. We compute the IRI estimates in a healthy population to demonstrate its diagnostic utility in chronic diseases and from a diseased cohort to demonstrate its potential in disease monitoring.

show abstract

Model-Based Clustering of High-Dimensional Longitudinal Data via Regularization

Cited by 7 publications

References 31 publications

Novel Clustering Methods Identified Three Caries Status-Related Clusters Based on Oral Microbiome in Thai Mother–Child Dyads

Novel Clustering Methods Identified Three Caries Status-Related Clusters Based on Oral Microbiome in Thai Mother–Child Dyads

Clustering of longitudinal physical activity trajectories among young females with selection of associated factors

Individual reference intervals in practice: A guide to personalise clinical and omics level data with IRIS

Contact Info

Product

Resources

About