Mechanism of Overfitting Avoidance Techniques for Training Deep Neural Networks

Sabiri, Bihi; Asri, Bouchra El; Rhanoui, Maryem

doi:10.5220/0011114900003179

Cited by 13 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After receiving a post-processed VAE-generated image I , this MLP output a label distribution p ( x|I ) ( x = 0, 1 , · · ·, 9). In Figure 4e , H realistic = E I [− Σ x p ( x|I ) ln p ( x|I )], where E I [ · ] means average over all post-processed VAE-generated images [99]; in Figure 4f , H xcat = − Σ x E I [ p ( x|I )] ln E I [ p ( x|I )] [99]. To plot Figure 4g , we first chose the post- processed VAE-generated images with high realisticity (i.e., max x p ( x|I ) > 0.9), then for all the images belonging to a category x , we calculated the variance λ i ( x ) along the i th principal component (PC), D incat was defined as .…”

Section: Methodsmentioning

confidence: 99%

Top-down generation of low-resolution representations improves visual perception and imagination

2021

Preprint

View full text Add to dashboard Cite

According to analysis-by-synthesis theories of perception, the primary visual cortex (V1) reconstructs visual stimuli through top-down pathway, and higher-order cortex reconstructs V1 activity. Experiments also found that neural representations are generated in a top-down cascade during visual imagination. What code does V1 provide higher-order cortex to reconstruct or simulate to improve perception or imaginative creativity? What unsupervised learning principles shape V1 for reconstructing stimuli so that V1 activity eigenspectrum is power-law with close-to-1 exponent? Using computational models, we reveal that reconstructing the activities of V1 complex cells facilitate higher-order cortex to form representations smooth to shape morphing of stimuli, improving perception and creativity. Power-law eigenspectrum with close-to-1 exponent results from the constraints of sparseness and temporal slowness when V1 is reconstructing stimuli, at a sparseness strength that best whitens V1 code and makes the exponent most insensitive to slowness strength. Our results provide fresh insights into V1 computation.

show abstract

Section: Methodsmentioning

confidence: 99%

Top-down generation of low-resolution representations improves visual perception and imagination

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…According to the authors of SSGAN (Salimans et al, 2016 ), “in practice, L unsup will only help if it is not trivial to minimize for our classifier and we thus need to train G to approximate the data distribution,” which explains that, while the L unsup of D and G converge at the same rhythm with CamemBERT and with ChouBERT-16, the troubled decrease of L sup with ChouBERT-16 renders worse F1 scores than those with CamemBERT. For example, in the group with 16 training examples (see Figure 6 ), the test F 1 observation scores with ChouBERT-16 are switching between 0 and 0.43, which means that the classifier predicts either all as non-observation or all as observation.…”

Section: Results and Evaluationmentioning

confidence: 99%

“…Many variants of GANs are proposed to improve sample generation and the stability of training. Some of these variants are the conditional GANs (CGANs), where the generator is conditional on one or more labels (Mirza and Osindero, 2014 ), and semi-supervised GANs (Salimans et al, 2016 ) (SS-GANs), where the discriminator is trained over its k -labeled examples plus the data generated by the generator as a new label “ k + 1”(see in Figure 1 ).…”

Section: Introductionmentioning

confidence: 99%

“…Generative adversarial network-bidirectional encoder representations from transformers (GAN-BERT) (Croce et al, 2020 ) extends the fine-tuning of BERT-like pre-trained language models (PLMs) for text classification with a semi-supervised discriminator–generator setting, introduced in the study by Salimans et al ( 2016 ). Let us project all the data points in a d -dimensional hidden space, then the data vector h ∈ R d .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving text mining in plant health domain with GAN and/or pre-trained language model

Jiang

Cormier

Angarita

et al. 2023

Front. Artif. Intell.

View full text Add to dashboard Cite

The Bidirectional Encoder Representations from Transformers (BERT) architecture offers a cutting-edge approach to Natural Language Processing. It involves two steps: 1) pre-training a language model to extract contextualized features and 2) fine-tuning for specific downstream tasks. Although pre-trained language models (PLMs) have been successful in various text-mining applications, challenges remain, particularly in areas with limited labeled data such as plant health hazard detection from individuals' observations. To address this challenge, we propose to combine GAN-BERT, a model that extends the fine-tuning process with unlabeled data through a Generative Adversarial Network (GAN), with ChouBERT, a domain-specific PLM. Our results show that GAN-BERT outperforms traditional fine-tuning in multiple text classification tasks. In this paper, we examine the impact of further pre-training on the GAN-BERT model. We experiment with different hyper parameters to determine the best combination of models and fine-tuning parameters. Our findings suggest that the combination of GAN and ChouBERT can enhance the generalizability of the text classifier but may also lead to increased instability during training. Finally, we provide recommendations to mitigate these instabilities.

show abstract

“…A Regularization Perspective on Knowledge Selection In practice, we consider that the knowledge selection can act as a regularization which prevents the co-adaptation (Grisogono, 2006;Sabiri et al, 2022) in KD, i.e., distilling a student model highly depends on a certain behavior of the teacher. If the distilled student model receives the inappropriate knowledge from the dependent behavior of the teacher, it can significantly alter the performance of the student model, which is what might happen with overfitting (Hawkins, 2004;Phaisangittisagul, 2016).…”

Section: Performance On Different Student Modelsmentioning

confidence: 99%

Computer Simulation of Freeze-thaw Cycle in Long'en Hall of Zhaoling Mausoleum in Shenyang City, China

Wang

2022

2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)

View full text Add to dashboard Cite

Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use of them to train the student model. Our preliminary study shows that: (1) not all of the knowledge is necessary for learning a good student model, and (2) knowledge distillation can benefit from certain knowledge at different training steps. In response to these, we propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation. In addition, we offer a refinement of the training algorithm to ease the computational burden. Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.

show abstract

Mechanism of Overfitting Avoidance Techniques for Training Deep Neural Networks

Cited by 13 publications

References 0 publications

Top-down generation of low-resolution representations improves visual perception and imagination

Top-down generation of low-resolution representations improves visual perception and imagination

Improving text mining in plant health domain with GAN and/or pre-trained language model

Computer Simulation of Freeze-thaw Cycle in Long'en Hall of Zhaoling Mausoleum in Shenyang City, China

Contact Info

Product

Resources

About