“…Different traditional knowledge distillation of two‐stage scheme, self‐distillation is a one‐phase training scheme without incurring extra computation cost, which gradually utilizes its own knowledge for softening the ground‐truth targets to improve generalization performance. Specifically, let
be the prediction from mean‐teacher branch, we can utilize mean‐teacher prediction to soften ground‐truth label
, described as
31,32 where
is current student prediction, and
controls how much we are going to trust the knowledge from the teacher. In our approach, the uniform and reversed samplings are fed into teacher branches to obtain the different distribution prediction
and
.…”