2013
DOI: 10.1214/13-aos1092
|View full text |Cite
|
Sign up to set email alerts
|

Density-sensitive semisupervised inference

Abstract: Semisupervised methods are techniques for using labeled data $(X_1,Y_1),\ldots,(X_n,Y_n)$ together with unlabeled data $X_{n+1},\ldots,X_N$ to make predictions. These methods invoke some assumptions that link the marginal distribution $P_X$ of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of $P_X$. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 18 publications
0
17
0
Order By: Relevance
“…In our setting, determining whether or not a complication occurred for a given case often requires extensive manual review of the case file and is subject to varying clinical definitions of outcomes. A promising modeling approach for this situation is semisupervised inference, such as discussed in Azizyan et al 22 Existing formal methods for semisupervised inference typically make assumptions linking the probability density function of the predictor variables to the function used for outcome prediction. It is unclear if such methods would be effective in our setting with a high-dimensional predictor set and low fraction of labeled data; we have not pursued this approach here.…”
Section: Complication Outcome Variablesmentioning
confidence: 99%
“…In our setting, determining whether or not a complication occurred for a given case often requires extensive manual review of the case file and is subject to varying clinical definitions of outcomes. A promising modeling approach for this situation is semisupervised inference, such as discussed in Azizyan et al 22 Existing formal methods for semisupervised inference typically make assumptions linking the probability density function of the predictor variables to the function used for outcome prediction. It is unclear if such methods would be effective in our setting with a high-dimensional predictor set and low fraction of labeled data; we have not pursued this approach here.…”
Section: Complication Outcome Variablesmentioning
confidence: 99%
“…It is known that the affinity between data points depends not only on the location but also on the neighborhoods of two points. The density of data has been considered by various metrics in supervised [7], unsupervised [8], semisupervised [9] and deep learning [10]. The affinity between two data points with different neighborhood density is different from the affinity when both neighborhoods have the same density [11].…”
Section: Related Workmentioning
confidence: 99%
“…A density of samples in a data space has been already taken into account by various metrics used for different machine learning tasks: unsupervised learning (Soleimani et al 2015); supervised learning (Plant et al 2006or Aggarwal 2007; semi-supervised learning (Azizyan et al 2013); reinforcement learning (Rojanaarpa and Kataeva 2016); deep learning (Nicolau and McDermott 2016); matchmaking for queries and ontologies (Naumenko et al 2006). The common assumption is that the two data samples x and y from the regions with different density of neighboring data samples must be considered more distant than in the case when both regions have the same density; and therefore some penalty is applied to a spatial distance between them.…”
Section: Introductionmentioning
confidence: 99%