Proceedings of the 24th International Conference on Enterprise Information Systems 2022
DOI: 10.5220/0011114900003179
|View full text |Cite
|
Sign up to set email alerts
|

Mechanism of Overfitting Avoidance Techniques for Training Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…After receiving a post-processed VAE-generated image I , this MLP output a label distribution p ( x|I ) ( x = 0, 1 , · · ·, 9). In Figure 4e , H realistic = E I [− Σ x p ( x|I ) ln p ( x|I )], where E I [ · ] means average over all post-processed VAE-generated images [99]; in Figure 4f , H xcat = − Σ x E I [ p ( x|I )] ln E I [ p ( x|I )] [99]. To plot Figure 4g , we first chose the post- processed VAE-generated images with high realisticity (i.e., max x p ( x|I ) > 0.9), then for all the images belonging to a category x , we calculated the variance λ i ( x ) along the i th principal component (PC), D incat was defined as .…”
Section: Methodsmentioning
confidence: 99%
“…After receiving a post-processed VAE-generated image I , this MLP output a label distribution p ( x|I ) ( x = 0, 1 , · · ·, 9). In Figure 4e , H realistic = E I [− Σ x p ( x|I ) ln p ( x|I )], where E I [ · ] means average over all post-processed VAE-generated images [99]; in Figure 4f , H xcat = − Σ x E I [ p ( x|I )] ln E I [ p ( x|I )] [99]. To plot Figure 4g , we first chose the post- processed VAE-generated images with high realisticity (i.e., max x p ( x|I ) > 0.9), then for all the images belonging to a category x , we calculated the variance λ i ( x ) along the i th principal component (PC), D incat was defined as .…”
Section: Methodsmentioning
confidence: 99%
“…According to the authors of SSGAN (Salimans et al, 2016 ), “in practice, L unsup will only help if it is not trivial to minimize for our classifier and we thus need to train G to approximate the data distribution,” which explains that, while the L unsup of D and G converge at the same rhythm with CamemBERT and with ChouBERT-16, the troubled decrease of L sup with ChouBERT-16 renders worse F1 scores than those with CamemBERT. For example, in the group with 16 training examples (see Figure 6 ), the test F 1 observation scores with ChouBERT-16 are switching between 0 and 0.43, which means that the classifier predicts either all as non-observation or all as observation.…”
Section: Results and Evaluationmentioning
confidence: 99%
“…Many variants of GANs are proposed to improve sample generation and the stability of training. Some of these variants are the conditional GANs (CGANs), where the generator is conditional on one or more labels (Mirza and Osindero, 2014 ), and semi-supervised GANs (Salimans et al, 2016 ) (SS-GANs), where the discriminator is trained over its k -labeled examples plus the data generated by the generator as a new label “ k + 1”(see in Figure 1 ).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…A Regularization Perspective on Knowledge Selection In practice, we consider that the knowledge selection can act as a regularization which prevents the co-adaptation (Grisogono, 2006;Sabiri et al, 2022) in KD, i.e., distilling a student model highly depends on a certain behavior of the teacher. If the distilled student model receives the inappropriate knowledge from the dependent behavior of the teacher, it can significantly alter the performance of the student model, which is what might happen with overfitting (Hawkins, 2004;Phaisangittisagul, 2016).…”
Section: Performance On Different Student Modelsmentioning
confidence: 99%