Proceedings of the 22nd Conference on Computational Natural Language Learning 2018
DOI: 10.18653/v1/k18-1055
|View full text |Cite
|
Sign up to set email alerts
|

From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information

Abstract: Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training. Inspired by dropout, this paper presents GI-Dropout, a novel dropout method integrating with global information to improve neural networks for text classification. Unlike the traditional dropout method in which the units are dropped randomly according to the same probability, we aim to use explicit instructions based on global information of the dataset to guide the training process. With GI-Dropout, the m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 20 publications
(38 reference statements)
0
5
0
Order By: Relevance
“…It should be noted that our work is partially inspired by two recent models: one is (Wei et al, 2017) proposed to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems, and the other is (Xu et al, 2018) where a dropout method integrating with global information is presented to encourage the model to mine inapparent features or patterns for text classification. Nevertheless, to the best of our knowledge, our work is the first effort to explore automatic mining of attention supervision information for ABSA.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…It should be noted that our work is partially inspired by two recent models: one is (Wei et al, 2017) proposed to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems, and the other is (Xu et al, 2018) where a dropout method integrating with global information is presented to encourage the model to mine inapparent features or patterns for text classification. Nevertheless, to the best of our knowledge, our work is the first effort to explore automatic mining of attention supervision information for ABSA.…”
Section: Related Workmentioning
confidence: 99%
“…However, the existing attention mechanisms of ABSA models suffer from a major drawback, which is also seen in neural models of other NLP tasks. Specifically, NNs are easily affected by these two patterns: "apparent patterns" tend to be overly learned while "inapparent patterns" are not sufficiently learned (Li et al, 2018;Xu et al, 2018;Lin et al, 2017). "Apparent patterns" and "inapparent patterns" are widely present in the training corpus of ABSA, where "apparent patterns" are the high-frequency words with strong sentiment polarities while "inapparent patterns" are low-frequency sentiment-related words.…”
Section: Introductionmentioning
confidence: 99%
“…Our work is inspired by two recent models: one is proposed to progressively mine discriminative object regions using classification networks to address the weakly-supervised semantic segmentation problems, and the other one is (Xu et al, 2018) where a dropout method integrating with global information is presented to encourage the model to mine inapparent features or patterns for text classification. To the best of our knowledge, our work is the first one to explore automatic mining of attention supervision information for ASC.…”
Section: Related Workmentioning
confidence: 99%
“…Here, "apparent patterns" are interpreted as high-frequency words with strong sentiment polarities and "inapparent patterns" are referred to as low-frequency ones in training data. As mentioned in Xu et al, 2018;Lin et al, 2017), NNs are easily affected by these two modes: "apparent patterns" tend to be overly learned while "inapparent patterns" often can not be fully learned.…”
Section: Introductionmentioning
confidence: 99%
“…Because we don't want the NER model to just memorize the training dataset, the model must process the training dataset in batches and experiment with mini batch sizes and dropout rates -a rate at which individual features and representations are randomly "drop" [26] for a number of training iterations. In order to tune the accuracy for the mammogram NER model, we set the number of iterations=1000 and the dropout rate =0.35 which means that each feature or internal representation has a 0.35 likelihood of being dropped.…”
Section: Figure 3: Ner Model Processmentioning
confidence: 99%