Robust estimation of high-dimensional covariance and precision matrices

Avella‐Medina, Marco; Battey, Heather; Fan, Jianqing; Li, Quefeng

doi:10.1093/biomet/asy011

Cited by 69 publications

(58 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All of these applications present some shared challenges: In most cases, the number of features (genes, brain regions, microbial taxa) far exceed the number of data samples; It is generally impossible, without making additional assumptions or incorporating domain knowledge, to distinguish between direct and indirect correlations; The choice of the correlation or similarity measure is often application-dependent. Methods for microbial ecology network estimation from metagenomic data could benefit greatly from recent advances in high dimensional correlation matrix estimation [67][68][69][70] . Work in progress is aimed at evaluating the applicability of such methods in constructing stable microbial ecology networks from metagenomic data.…”

Section: Discussionmentioning

confidence: 99%

“…Furthermore, the proposed method is able to achieve its best observed performance using only only 50 samples for feature selection. Work in progress is aimed at further improving the two key components of NBBD, e.g., by incorporating recent advances in high dimensional correlation matrix estimation [67][68][69][70] to improve the reliability and the stability of the resulting networks, exploring improved node scoring methods. Other promising directions for future research include systematic evaluation of the NBBD framework for biomarker discovery from different types of omics data, integrative analyses of multi-omics data 71,72 , e.g., using information-preserving low-dimensional network embeddings 73 .…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Biomarker discovery in inflammatory bowel diseases using network-based feature selection

Abbas

Matta

et al. 2019

Preprint

View full text Add to dashboard Cite

Reliable identification of inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small. 2/15 10/15 11/15 15/15

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Biomarker discovery in inflammatory bowel diseases using network-based feature selection

Abbas

Matta

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Both constructions, however, involve brute-force search over every direction in a d-dimensional ε-net, and thus are computationally intractable. From an element-wise perspective, Avella-Medina et al (2018) combined robust estimates of the first and second moments to obtain variance estimators. In practice, three potential drawbacks of this approach are: (i) the accumulated error consists of those from estimating the first and second moments, which may be significant; (ii) the diagonal variance estimators are not necessarily positive and therefore additional adjustments are required; and (iii) using the cross-validation to calibrate a total number of O(d 2 ) tuning parameters is computationally expensive.…”

Section: Overview Of the Previous Workmentioning

confidence: 99%

“…Building on the ideas of and Avella-Medina et al (2018), we propose user-friendly tail-robust covariance estimators that enjoy desirable finite-sample deviation bounds under weak moment conditions. The constructed estimators only involve simple truncation techniques and are computationally friendly.…”

Section: Overview Of the Previous Workmentioning

confidence: 99%

“…More precisely, following the terminology used by Devroye et al (2016), it is called a δ-dependent sub-Gaussian estimator (under the max norm). Estimators of a similar type include those of , Minsker (2015), Brownlees, Joly and Lugosi (2015), Hsu and Sabato (2016), and Avella-Medina et al (2018), among others. For univariate mean estimation, Devroye et al (2016) proposed multiple-δ mean estimators that satisfy exponential-type concentration bounds uniformly over δ ∈ [δ min , 1).…”

Section: Element-wise Truncated Estimatormentioning

confidence: 99%

See 1 more Smart Citation

User-Friendly Covariance Estimation for Heavy-Tailed Distributions

Yuan¹,

Minsker²,

Ren³

et al. 2019

Statist. Sci.

View full text Add to dashboard Cite

We offer a survey of recent results on covariance estimation for heavytailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce element-wise and spectrum-wise truncation operators, as well as their M -estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key observation is that the estimators needs to adapt to the sample size, dimensionality of the data and the noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate their practical use, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods.

show abstract

Sparse precision matrix estimation under lower polynomial moment assumption

Miao,

Wang

2023

Math Methods in App Sciences

View full text Add to dashboard Cite

Precision matrix (inverse covariance matrix) estimation is a rising challenge in contemporary applications while dealing with high‐dimensional data. This paper focuses on large‐scale precision matrix of the random vector that only has lower polynomial moments. We mainly investigate upper bounds of the proposed estimator under the spectral norm in terms of the probability and mean estimation respectively. It is shown that the data‐driven estimator is fully adaptive and achieves the same optimal convergence order as under Gaussian assumption on the data. Simulation studies further support our theoretical claims.

show abstract

Robust estimation of high-dimensional covariance and precision matrices

Cited by 69 publications

References 30 publications

Biomarker discovery in inflammatory bowel diseases using network-based feature selection

Biomarker discovery in inflammatory bowel diseases using network-based feature selection

User-Friendly Covariance Estimation for Heavy-Tailed Distributions

Sparse precision matrix estimation under lower polynomial moment assumption

Contact Info

Product

Resources

About