2016
DOI: 10.1109/jproc.2015.2494178
|View full text |Cite
|
Sign up to set email alerts
|

Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining

Abstract: When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
14
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
2

Relationship

4
5

Authors

Journals

citations
Cited by 20 publications
(15 citation statements)
references
References 139 publications
1
14
0
Order By: Relevance
“…Followed by the estimation of time courses, a sliding window is applied on the time courses that divides it into consecutive windows and an analysis on the time points within each window is performed (Allen et al, 2014). The analysis of dFNC patterns depends on the length of the window, where the use of a longer window length increases the risk of averaging the temporal fluctuations of interest resulting in false negatives (Preti et al, 2017), and the use of a shorter window length has too few samples for a reliable computation of correlation (Hero and Rajaratnam, 2016), resulting in the temporal variations to capture spurious fluctuations and increasing the risk of false positives (Sakoğlu et al, 2010; Hutchison et al, 2013; Leonardi and Van De Ville, 2015). Previous studies have shown that a window length between 30 and 60 s successfully estimates temporal fluctuations in resting-state functional magnetic resonance imaging (fMRI) data (Preti et al, 2017), and for most cases higher window lengths do not alter the results significantly (Keilholz et al, 2013; Li et al, 2014; Liégeois et al, 2016).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Followed by the estimation of time courses, a sliding window is applied on the time courses that divides it into consecutive windows and an analysis on the time points within each window is performed (Allen et al, 2014). The analysis of dFNC patterns depends on the length of the window, where the use of a longer window length increases the risk of averaging the temporal fluctuations of interest resulting in false negatives (Preti et al, 2017), and the use of a shorter window length has too few samples for a reliable computation of correlation (Hero and Rajaratnam, 2016), resulting in the temporal variations to capture spurious fluctuations and increasing the risk of false positives (Sakoğlu et al, 2010; Hutchison et al, 2013; Leonardi and Van De Ville, 2015). Previous studies have shown that a window length between 30 and 60 s successfully estimates temporal fluctuations in resting-state functional magnetic resonance imaging (fMRI) data (Preti et al, 2017), and for most cases higher window lengths do not alter the results significantly (Keilholz et al, 2013; Li et al, 2014; Liégeois et al, 2016).…”
Section: Introductionmentioning
confidence: 99%
“…Hence, spatio-temporal dFNC analysis relaxes the assumption of stationarity in both the spatial and temporal domain, and provides a more general framework for capturing time-varying FNC patterns (Ma et al, 2014; Kottaram et al, 2018; Kunert-Graf et al, 2018). The availability of higher number of samples in the spatial domain also guarantees reliable estimation of functional correlations (Hero and Rajaratnam, 2016), thus providing a promising direction for the use of spatial domain for dFNC analysis. However, the methods used to extract time-varying spatio-temporal patterns face few challenges.…”
Section: Introductionmentioning
confidence: 99%
“…This is equivalent to the maximal kNN distance between columns, as measured by correlation distance. The theory from [17] helps us establish that the proposed summary statistic has a well defined exponential limiting distribution as p → ∞ for fixed n, the so-called "purely high dimensional regime" [1]. This summary statistic is related to the empirical distribution of the vertex degree of the correlation graph associated with the thresholded sample correlation matrix.…”
Section: Problem Descriptionmentioning
confidence: 99%
“…The solution is optimal in the following sense. The theory from [6] establishes that a certain summary statistic, denoted by V (X), derived from an n × p random matrix X has a limiting distribution as p → ∞ for fixed n, the so-called "purely high dimensional regime" [16]. This summary statistic is related to the empirical distribution of the vertex degree of the correlation graph associated with the thresholded sample correlation matrix.…”
Section: Problem Descriptionmentioning
confidence: 99%