Covariate Shift by Kernel Mean Matching

Gretton, Arthur; Smola, AJ; Huang, Jiapeng; Schmittfull, Marcel; Borgwardt, Karsten; Schölkopf, Bernhard; Candela, Quiñonero; Sugiyama, Masashi; Schwaighofer, Anton; Lawrence, Neil D.

doi:10.7551/mitpress/9780262170055.003.0008

Cited by 465 publications

(654 citation statements)

References 26 publications

Supporting

Mentioning

527

Contrasting

Order By: Relevance

“…. , X n without going through density estimation (Gretton et al 2009). The basic idea of KMM is to find w 0 (x) such that the mean discrepancy between nonlinearly transformed samples drawn from P and Q is minimized in a universal reproducing kernel Hilbert space (Steinwart 2001).…”

Section: Kernel Mean Matching (Kmm)mentioning

confidence: 99%

See 1 more Smart Citation

Statistical analysis of kernel-based least-squares density-ratio estimation

2011

View full text Add to dashboard Cite

The ratio of two probability densities can be used for solving various machine learning tasks such as covariate shift adaptation (importance sampling), outlier detection (likelihood-ratio test), feature selection (mutual information), and conditional probability estimation. Several methods of directly estimating the density ratio have recently been developed, e.g., moment matching estimation, maximum-likelihood density-ratio estimation, and least-squares density-ratio fitting. In this paper, we propose a kernelized variant of the least-squares method for density-ratio estimation, which is called kernel unconstrained leastsquares importance fitting (KuLSIF). We investigate its fundamental statistical properties including a non-parametric convergence rate, an analytic-form solution, and a leave-oneout cross-validation score. We further study its relation to other kernel-based density-ratio estimators. In experiments, we numerically compare various kernel-based density-ratio estimation methods, and show that KuLSIF compares favorably with other approaches.

show abstract

Section: Kernel Mean Matching (Kmm)mentioning

confidence: 99%

“…The kernel mean matching (KMM) method (Gretton et al 2009) directly gives estimates of the density ratio by matching the two distributions using universal reproducing kernel Hilbert spaces (Steinwart 2001). KMM can be regarded as a kernelized variant of Qin's moment matching estimator (Qin 1998).…”

Section: Introductionmentioning

confidence: 99%

Statistical analysis of kernel-based least-squares density-ratio estimation

2011

View full text Add to dashboard Cite

show abstract

“…Thus, direct density-ratio estimation is substantially easier than density estimation . Following this idea, methods of direct density-ratio estimation have been developed , e.g., kernel mean matching (Gretton et al, 2009), the logistic-regression method (Bickel et al, 2007), and the Kullback-Leibler importance estimation procedure (KLIEP) (Sugiyama et al, 2008). In the context of change-point detection, KLIEP was reported to outperform other approaches (Kawahara and Sugiyama, 2012) such as the one-class support vector machine (Schölkopf et al, 2001;Desobry et al, 2005) and singular-spectrum analysis (Moskvina and Zhigljavsky, 2003b).…”

Section: Introductionmentioning

confidence: 99%

Change-point detection in time-series data by relative density-ratio estimation

Yamada²,

et al. 2013

View full text Add to dashboard Cite

The objective of change-point detection is to discover abrupt property changes lying behind time-series data. In this paper, we present a novel statistical changepoint detection algorithm based on non-parametric divergence estimation between time-series samples from two retrospective segments. Our method uses the relative Pearson divergence as a divergence measure, and it is accurately and efficiently estimated by a method of direct density-ratio estimation. Through experiments on artificial and real-world datasets including human-activity sensing, speech, and Twitter messages, we demonstrate the usefulness of the proposed method.

show abstract

“…Then an additional variable, s i , for each sample of training data set is defined [21,14]. s i is set to depend only on one of the sample features, therefore, the biasing procedure is called, simple bias [6]. This additional variable determines whether the corresponding sample is contributing in the biased training data set or not.…”

Section: Real World Data Setsmentioning

confidence: 99%

“…"Covariate shift" [5,6] and "class imbalance" [7] are two examples with different initial assumptions:…”

Section: Overviewmentioning

confidence: 99%