Covariance selection by thresholding the sample correlation matrix

Jiang, Bernard C.

doi:10.1016/j.spl.2013.07.008

Cited by 13 publications

(40 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…(9) because the interpretation of the proportional threshold is quite simple. For the choice of the threshold value h ( α ), a cross-validation was introduced (e.g., Bickel and Levina [2008a], Jiang [2013]). The cross-validation procedure for the estimation of h ( α ) consists of three steps, as shown in Figure1.…”

Section: Methodsmentioning

confidence: 99%

“…The other type uses thresholding values to achieve a sparse structure. [Bickel and Levina, 2008a] proposed the thresholding matrix estimator and various related methods have been developed [Cai and Liu, 2011, Bickel and Levina, 2008b, El Karoui, 2008, Jiang, 2013]. In addition, to estimate sparse correlation matrix, [Rothman et al, 2009, Lam and Fan, 2009, Liu et al, 2014] used generalized thresholding operator based methods [Rothman et al, 2009], respectively.…”

Section: Introductionmentioning

confidence: 99%

“…In the proposed approach, we adopt an approach that uses hard thresholding based on [Bickel and Levina, 2008a] and [Jiang, 2013] because the approach is quite simple and easy to interpret the results. Therefore, to estimate sparse low-rank correlation matrix, we combine the majorize-minimization algorithm (MM algorithm) [Pietersz and Groenen, 2004, Simon and Abell, 2010] and the hard thresholding approach.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Thresholding Approach For Low-Rank Correlation Matrix Based On Mm Algorithm

Tanioka

Furotani

Hiwa

2021

Preprint

View full text Add to dashboard Cite

Background: Low-rank approximation is a very useful approach for interpreting the features of a correlation matrix; however, a low-rank approximation may result in estimation far from zero even if the corresponding original value was far from zero. In this case, the results lead to misinterpretation. Methods: To overcome these problems, we propose a new approach to estimate a sparse low-rank correlation matrix based on threshold values combined with cross-validation. In the proposed approach, the MM algorithm was used to estimate the sparse low-rank correlation matrix, and a grid search was performed to select the threshold values related to sparse estimation. Results: Through numerical simulation, we found that the FPR and average relative error of the proposed method were superior to those of the tandem approach. For the application of microarray gene expression, the FPRs of the proposed approach with d=2,3, and 5 were 0.128, 0.139, and 0.197, respectively, while FPR of the tandem approach was 0.285. Conclusions: We propose a novel approach to estimate sparse low-rank correlation matrix. The advantage of the proposed method is that it provides results that are easy to interpret and avoid misunderstandings. We demonstrated the superiority of the proposed method through both numerical simulations and real examples.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Thresholding Approach For Low-Rank Correlation Matrix Based On Mm Algorithm

Tanioka

Furotani

Hiwa

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…This outperforms thresholding the covariance matrix directly and the consistency of doing so is justified in [16].…”

Section: Chapter 2 Related Workmentioning

confidence: 99%

Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices

Gao

Jermaine

2016

ACM Trans. Knowl. Discov. Data

View full text Add to dashboard Cite

Computation of covariance matrices from observed data is an important problem, as such matrices are used in applications such as PCA, LDA, and increasingly in the learning and application of probabilistic graphical models. However, computing an empirical covariance matrix is not always an easy problem.There are two key difficulties associated with computing such a matrix from a very high-dimensional data set. The first problem is over-fitting. For a p-dimensional covariance matrix, there are p(p − 1)/2 unique, off-diagonal entries in the empirical covariance matrixŜ; for large p (say, p > 10 5 ) the size n of the data set is often much smaller than the number of covariances to compute. Over-fitting is a concern in any situation where the number of parameters learned can greatly exceed the size of the data set. Thus, there are strong theoretical reasons to expect that for highdimensional data-even Gaussian data-the empirical covariance matrix is not a good estimate for the true covariance matrix underlying the generative process.The second problem is computational. Computing a covariance matrix takes O(np 2 ) time. For large p (greater than 10,000) and n much greater than p, this is debilitating.The first problem (over-fitting) has been studied in depth in both the statistics and machine learning literature, but the second problem (computation) has been ignored, presumably because these fields have typically been concerned with relatively small data sets. In this thesis, we consider how both of these difficulties can be handled simultaneously. Specifically, a key regularization technique for high-dimensional covariance estimation is thresholding, where the smallest or least significant entries in the covariance matrix are simply dropped and replaced with the value 0. This suggests an obvious way to address the computational difficulty as well: first, compute the identities of the K entires in the covariance matrix that are actually important in the sense that they will not be removed during thresholding, and then in a second step, compute the values of those entries. This can be done in O(Kn) time. If K << p 2 and the identities of the important entries can be computed in reasonable time, then this is a big win.The key technical contribution of this thesis is the design and implementation of two different distributed algorithms for approximating the identities of the important entries quickly, using sampling. We have implemented these methods and tested them using an 800 core compute cluster. Experiments have been run using real data sets having millions of data points and up to 40, 000 dimensions. These experiments show that the proposed methods are both accurate and efficient.

show abstract

“…We compare the correlation coefficients matrices (after thresholding as described in [2,7]) for the two groups in each experiment (Workshops A and B, Seminars A and B).…”

Section: Correlation Coefficients Matricesmentioning

confidence: 99%

Do you know the speaker?

Lim

Jiang

Lim

et al. 2014

Proceedings of the 23rd International Conference on World Wide Web

Self Cite

View full text Add to dashboard Cite

With the widespread adoption of the Web, many companies and organizations have established websites that provide information and support online transactions (e.g., buying products or viewing content). Unfortunately, users have limited attention to spare for interacting with online sites. Hence, it is of utmost importance to design sites that attract user attention and effectively guide users to the product or content items they like. Thus, we propose a novel and scalable experimentation approach to evaluate the effectiveness of online site designs. Our case study focuses on the effects of an authority message on visitors' browsing behavior on workshop and seminar online announcement sites. An authority message emphasizes a particular prominent speaker and his/her achievements. Through dividing users into control and treatment groups and carefully tracking their online activities, we observe that the authority message influences the way users interact with page elements on the website and increases their interests in the authority speakers.

show abstract

Covariance selection by thresholding the sample correlation matrix

Cited by 13 publications

References 9 publications

Thresholding Approach For Low-Rank Correlation Matrix Based On Mm Algorithm

Thresholding Approach For Low-Rank Correlation Matrix Based On Mm Algorithm

Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices

Do you know the speaker?

Contact Info

Product

Resources

About