2019
DOI: 10.1007/978-3-030-32695-1_8
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge Distillation for Semi-supervised Domain Adaptation

Abstract: In the absence of sufficient data variation (e.g., scanner and protocol variability) in annotated data, deep neural networks (DNNs) tend to overfit during training. As a result, their performance is significantly lower on data from unseen sources compared to the performance on data from the same source as the training data. Semi-supervised domain adaptation methods can alleviate this problem by tuning networks to new target domains without the need for annotated data from these domains. Adversarial domain adap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 25 publications
(9 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…Among the surveyed 31 symmetric approaches, direct approaches operated on the feature representations across domains by minimizing their differences (via mutual information [ 63 ], maximum mean discrepancy [ 46 , 49 , 64 ], Euclidean distance [ 65 , 66 , 67 , 68 , 69 , 70 , 71 ], Wasserstein distance [ 72 ], and average likelihood [ 73 ]), maximizing their correlation [ 74 , 75 ] or covariance [ 36 ], and introducing sparsity with L1/L2 norms [ 42 , 76 ]. On the other hand, indirect approaches were applied via adversarial training [ 28 , 41 , 54 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 ], and knowledge distillation [ 86 ].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Among the surveyed 31 symmetric approaches, direct approaches operated on the feature representations across domains by minimizing their differences (via mutual information [ 63 ], maximum mean discrepancy [ 46 , 49 , 64 ], Euclidean distance [ 65 , 66 , 67 , 68 , 69 , 70 , 71 ], Wasserstein distance [ 72 ], and average likelihood [ 73 ]), maximizing their correlation [ 74 , 75 ] or covariance [ 36 ], and introducing sparsity with L1/L2 norms [ 42 , 76 ]. On the other hand, indirect approaches were applied via adversarial training [ 28 , 41 , 54 , 77 , 78 , 79 , 80 , 81 , 82 , 83 , 84 , 85 ], and knowledge distillation [ 86 ].…”
Section: Resultsmentioning
confidence: 99%
“…Additionally, various indirect symmetric feature-based approaches jointly optimized an adversarial loss and a task-specific loss on the source domain images [ 79 , 80 , 83 ]. Finally, Orbes-Arteainst et al [ 86 ] used knowledge distillation, training a teacher model on the labeled source domain, and optimizing a student network on the probabilistic maps from the teacher model derived with the source and target domain images.…”
Section: Resultsmentioning
confidence: 99%
“…Buciluǎ et al [11] have proposed compressing a large model into a simple model which reduces space requirements and increases inference speed at the cost of a small performance loss. This idea has been revisited in [12] under the name knowledge distillation (KD) and gathered significant amount of attention [13,14,15,16,17]. 1 KD considers a teacher network and a student network.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, other unsupervised domain adaptation methods are active research topics in medical image analysis. A recent work of Orbes-Arteainst et al ( 2019 ) proposed an unsupervised domain adaptation approach in a similar fashion to transfer learning with teacher-student learning strategy. The authors used knowledge-distillation technique where a supervised teacher model is used to train a student network by generating soft labels for the target domain.…”
Section: Introductionmentioning
confidence: 99%