A nonparametric approach to high-dimensional k-sample comparison problems

Mukhopadhyay, Subhadeep; Wang, Kaijun

doi:10.1093/biomet/asaa015

Cited by 21 publications

(19 citation statements)

References 19 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Construct the LP‐polynomial basis { T j ( X ; F X )} j ≥1 for the Hilbert space

L^{2} (F)

by applying Gram‐Schmidt orthonormalization (see Appendix A for more details) on the set of functions of the power of T 1 ( X ; F X ):

T_{1} (x; F_{X}) = \frac{\sqrt{12} {F^{mid} (x; F_{X}) - 1 / 2}}{\sqrt{1 - \sum_{x} p^{3} (x; F_{X})}},

LP‐bases obey the following orthonormality conditions with respect to the measure F :

\int T_{j} (x; F_{X}) d F (x; X) = 0, and \int T_{j} (x; F_{X}) T_{k} (x; F_{X}) d F (x; X) = δ_{j k} .

For data analysis, construct the empirical LP basis (in short eLP basis)

{T_{j} (x; {\tilde{F}}_{X})}_{j = 1, 2 \dots, m}

, where m is strictly less than the number of unique values in the sample { X 1 ,…, X n }. Note that our custom‐constructed basis functions are orthonormal polynomials of mid‐rank transform—for more details see Mukhopadhyay, Mukhopadhyay and Wang, and Mukhopadhyay and Parzen . LP‐orthonormal system plays a fundamental role in construct...…”

Section: United Lp‐nonparametric Methodsmentioning

confidence: 99%

Nonparametric universal copula modeling

Mukhopadhyay

Parzen

2020

Appl Stoch Models Bus & Ind

Self Cite

View full text Add to dashboard Cite

To handle the ubiquitous problem of “dependence learning,” copulas are quickly becoming a pervasive tool across a wide range of data‐driven disciplines encompassing neuroscience, finance, econometrics, genomics, social science, machine learning, healthcare, and many more. At the same time, despite their practical value, the empirical methods of “learning copula from data” have been unsystematic with full of case‐specific recipes. Taking inspiration from modern LP‐nonparametrics, this paper presents a modest contribution to the need for a more unified and structured approach of copula modeling that is simultaneously valid for arbitrary combinations of continuous and discrete variables.

show abstract

“…Construct the LP‐polynomial basis { T j ( X ; F X )} j ≥1 for the Hilbert space

L^{2} (F)

by applying Gram‐Schmidt orthonormalization (see Appendix A for more details) on the set of functions of the power of T 1 ( X ; F X ):

T_{1} (x; F_{X}) = \frac{\sqrt{12} {F^{mid} (x; F_{X}) - 1 / 2}}{\sqrt{1 - \sum_{x} p^{3} (x; F_{X})}},

LP‐bases obey the following orthonormality conditions with respect to the measure F :

\int T_{j} (x; F_{X}) d F (x; X) = 0, and \int T_{j} (x; F_{X}) T_{k} (x; F_{X}) d F (x; X) = δ_{j k} .

For data analysis, construct the empirical LP basis (in short eLP basis)

{T_{j} (x; {\tilde{F}}_{X})}_{j = 1, 2 \dots, m}

Section: United Lp‐nonparametric Methodsmentioning

confidence: 99%

Nonparametric universal copula modeling

Mukhopadhyay

Parzen

2020

Appl Stoch Models Bus & Ind

Self Cite

View full text Add to dashboard Cite

show abstract

“…One can also compute pvalues using the χ 2 m null distribution of qDIV. For more details see Mukhopadhyay and Wang (2020).…”

Section: Goodness-of-fit Diagnosticsmentioning

confidence: 99%

“…As a result, one can expand d x pF Y pyqq in the orthonormal basis of F Y pyq. One such orthonormal system is the LP-family of rank-polynomials (Mukhopadhyay, 2017a, Bruce et al, 2019, Mukhopadhyay and Wang, 2020a), which we denote tT j py; F Y qu to emphasize that they are polynomials of F Y pY q not Y , hence extremely robust. As the true F Y is unknown, we will instead use the empirical LP-bases (eLP) tT j py; r F Y qu for our data analysis.…”

Section: A Robust Learning Theorymentioning

confidence: 99%

Breiman's "Two Cultures'' Revisited and Reconciled

Mukhopadhyay

Wang²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…To prove Corollary 4.1, we need to show that there exists a value D α for whichP M K=1 D (K) > D α ∩ K * = K H 0 ≤ α, (B.22) with D (K) = K (k)=1 θ 2 (k)and the event {K * = K} indicates that K is the value selected by the AIC or BIC procedure in (4.7). The left-hand-side of (B 22). is the probability that at least one model leads to incorrectly reject H 0 when selected (while accounting for the selection probability).…”

mentioning

confidence: 99%

“…http://fermi.gsfc.nasa.gov/ssc/data/analysis/software3 In the LP acronym, the letter L typically denotes nonparametric methods based on quantiles, whereas P stands for polynomials[22, Supp S1].…”

mentioning

confidence: 99%

Informative goodness-of-fit for multivariate distributions

Algeri

2021

Electron. J. Statist.

View full text Add to dashboard Cite

This article discusses an informative goodness-of-fit (iGOF) approach to study multivariate distributions. When the null model is rejected, iGOF allows us to identify the underlying sources of mismodeling and naturally equips practitioners with additional insights on the nature of the deviations from the true distribution. The informative character of the procedure is achieved by exploiting smooth tests and random field theory to facilitate the analysis of multivariate data. Simulation studies show that iGOF enjoys high power for different types of alternatives. The methods presented here directly address the problem of background mismodeling arising in physics and astronomy. It is in these areas that the motivation of this work is rooted.

show abstract

A nonparametric approach to high-dimensional k-sample comparison problems

Cited by 21 publications

References 19 publications

Nonparametric universal copula modeling

Nonparametric universal copula modeling

Breiman's "Two Cultures'' Revisited and Reconciled

Informative goodness-of-fit for multivariate distributions

Contact Info

Product

Resources

About