Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Wei, Colin; Shen, Kendrick; Ma, Tengyu

doi:10.48550/arxiv.2010.03622

Cited by 31 publications

(43 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An alternative way to define φ(•) for sample x l is the discrepancy value of the model's prediction between the sample and its adversarial neighbor (Wei et al 2020). Adversarial neighbor is a sample which is similar to x l in terms of the input graph g l but has the most different prediction.…”

Section: Uncertainty Measurementsmentioning

confidence: 99%

Federated Learning of Molecular Properties with Graph Neural Networks in a Heterogeneous Setting

Zhu¹,

Luo²,

White³

2021

Preprint

View full text Add to dashboard Cite

Chemistry research has both high material and computational costs to conduct experiments. Institutions thus consider chemical data to be valuable and there have been few efforts to construct large public datasets for machine learning. Another challenge is that different intuitions are interested in different classes of molecules, creating heterogeneous data that cannot be easily joined by conventional distributed training. In this work, we introduce federated heterogeneous molecular learning to address these challenges. Federated learning allows end-users to build a global model collaboratively while preserving the training data distributed over isolated clients. Due to the lack of related research, we first simulate a federated heterogeneous benchmark called FedChem. Fed-Chem is constructed by jointly performing scaffold splitting and Latent Dirichlet Allocation on existing datasets. Our results on FedChem show that significant learning challenges arise when working with heterogeneous molecules. We then propose a method to alleviate the problem, namely Federated Learning by Instance reweighTing (FLIT). FLIT can align the local training across heterogeneous clients by improving the performance for uncertain samples. Comprehensive experiments conducted on our new benchmark FedChem validate the advantages of this method over other federated learning schemes. FedChem should enable a new type of collaboration for improving AI in chemistry that mitigates concerns about valuable chemical data.

show abstract

Section: Uncertainty Measurementsmentioning

confidence: 99%

Federated Learning of Molecular Properties with Graph Neural Networks in a Heterogeneous Setting

Zhu¹,

Luo²,

White³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…gorithm and show its convergence under proper initialization. Recent theoretical analysis (Wei et al 2020) and empirical evidence shows input consistency loss such as VAT loss (Miyato et al 2018) can further improve pseudolabeling in semi-supervised learning. Han et al (2019) points out that pseudolabel imputation can be viewed as minimizing minentropy as a type of Rényi entropy 1 1−α log( n i=1 p α i ) when α → ∞, and Shannon entropy in (Grandvalet and Bengio 2005) is the case when α → 1.…”

Section: Related Workmentioning

confidence: 99%

“…However, it has been shown that such a procedure may suffer from local minima (Grandvalet and Bengio 2005) or over-confident wrong pseudolabels (Zou et al 2019). Wei et al (2020) shows when the underlying data distribution and pseudolabeler satisfies expansion assumption (See Definition 3.1 and Assumption 4.1, 3.3 in Wei et al (2020)), self-training algorithms with input-consistency are able to achieve improvement from pseudolabeling (Theorem 4.3 in Wei et al (2020)). Intuitively, the condition states that there needs to be many correct neighbors around the errors made by pseudolabeler so that correct labels can refine the decision boundary.…”

Section: Algorithmmentioning

confidence: 99%

“…Last, the separation assumption is automatically met given the definition of T . The proof is completed by directly applying Theorem 4.3 in Wei et al (2020). Here L(f ) is a weighted sum of input consistency loss and the loss for fitting pseudolabels, which offers the theoretical guarantee for CST.…”

Section: Theoretical Guarantee Of Cstmentioning

confidence: 99%

“…Here L(f ) is a weighted sum of input consistency loss and the loss for fitting pseudolabels, which offers the theoretical guarantee for CST. In CST, the source loss is added for training stability, which is often used in self-training algorithms (Zou et al 2019;Wei et al 2020;Rizve et al 2021). For Equation 13, it is a direct result of sum of geometric sequence.…”

Section: Theoretical Guarantee Of Cstmentioning

confidence: 99%

See 2 more Smart Citations

Enhancing Counterfactual Classification via Self-Training

Gao¹,

Biggs²,

Sun³

et al. 2021

Preprint

View full text Add to dashboard Cite

Unlike traditional supervised learning, in many settings only partial feedback is available. We may only observe outcomes for the chosen actions, but not the counterfactual outcomes associated with other alternatives. Such settings encompass a wide variety of applications including pricing, online marketing and precision medicine. A key challenge is that observational data are influenced by historical policies deployed in the system, yielding a biased data distribution. We approach this task as a domain adaptation problem and propose a selftraining algorithm which imputes outcomes with categorical values for finite unseen actions in the observational data to simulate a randomized trial through pseudolabeling, which we refer to as Counterfactual Self-Training (CST). CST iteratively imputes pseudolabels and retrains the model. In addition, we show input consistency loss can further improve CST performance which is shown in recent theoretical analysis of pseudolabeling. We demonstrate the effectiveness of the proposed algorithms on both synthetic and real datasets.

show abstract