2020
DOI: 10.1093/biomet/asaa007
|View full text |Cite
|
Sign up to set email alerts
|

Sparse semiparametric canonical correlation analysis for data of mixed types

Abstract: Canonical correlation analysis investigates linear relationships between two sets of variables, but often works poorly on modern data sets due to high-dimensionality and mixed data types (continuous/binary/zero-inflated). We propose a new approach for sparse canonical correlation analysis of mixed data types that does not require explicit parametric assumptions. Our main contribution is the use of truncated latent Gaussian copula to model the data with excess zeroes, which allows us to derive a rank-based esti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
66
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(70 citation statements)
references
References 39 publications
(61 reference statements)
0
66
0
Order By: Relevance
“…More specifically, truncation to zero effects for low sequencing read counts likely obstruct unbiased estimation of negative correlations, and in turn, proportionality. A possible remedy for this data-induced artifact is the application of more advanced semi-parametric correlation estimators that infer latent correlations under data truncation assumptions (21,46). A detailed investigation of semi-parametric and other estimators may provide a promising avenue for future research.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…More specifically, truncation to zero effects for low sequencing read counts likely obstruct unbiased estimation of negative correlations, and in turn, proportionality. A possible remedy for this data-induced artifact is the application of more advanced semi-parametric correlation estimators that infer latent correlations under data truncation assumptions (21,46). A detailed investigation of semi-parametric and other estimators may provide a promising avenue for future research.…”
Section: Discussionmentioning
confidence: 99%
“…Our computational data analysis workflow, available on GitHub and as synapse project (see Data Availability), is fully reproducible, provides all novel shrinkage estimators introduced here, and allows easy extension and comparison to additional data normalization, estimation, and downstream analysis tasks. For instance, future work could include the integration of more advanced zero-replacement strategies (51,52), application of popular data normalization schemes from single-cell data analysis (53) or the application of other correlation (21,46) or proportionality estimators, including those available in the propr package (23). Here, rather than using universal thresholding for sparsifying associations, more advanced selection strategies that control false discovery rates (as available in the propr package (23)) may improve the consistency of the microbial association inference workflows.…”
Section: Discussionmentioning
confidence: 99%
“…• Σ is estimated using a semi-parametric rank based approach relying on a truncated Gaussian copula model [50], which can deal with zeros in the data…”
Section: • R Packagementioning
confidence: 99%
“…For quantitative microbiome data, however, correlation and inverse correlation estimators are not yet available. In this work we propose to take a different approach relying on the recently proposed truncated Gaussian copula framework (Yoon et al, 2018).…”
Section: Semi-parametric Rank-based Correlation and Partial Correlmentioning
confidence: 99%
“…To this end, we first revisit a novel semi-parametric rank-based (SPR) approach to correlation estimation that can naturally deal with the large number of zeros in the data. The SPR estimator is easy to compute and can readily replace the naïve Pearson or rank-based sample correlation estimator which are often used as a first step in downstream statistical analysis tasks, including principal component analysis, principle coordinate analysis, discriminant analysis, or canonical correlation analysis (Yoon et al, 2018). Here we use the semi-parametric rank-based estimator as a starting point for sparse partial correlation estimation and introduce the Semi-Parametric Rank-based approach for INference in Graphical model (SPRING).…”
Section: Introductionmentioning
confidence: 99%