We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. BERT demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language model or the shallow concatenation of a forward and backward model. We retrofit BERT to conditional BERT by introducing a new conditional masked language model 1 task. The well trained conditional BERT can be applied to enhance contextual augmentation. Experiments on six various different text classification tasks show that our method can be easily applied to both convolutional or recurrent neural networks classifier to obtain obvious improvement.
Multi-turn retrieval-based conversation is an important task for building intelligent dialogue systems. Existing works mainly focus on matching candidate responses with every context utterance on multiple levels of granularity, which ignore the side effect of using excessive context information. Context utterances provide abundant information for extracting more matching features, but it also brings noise signals and unnecessary information.In this paper, we will analyze the side effect of using too many context utterances and propose a multi-hop selector network (MSN) to alleviate the problem. Specifically, MSN firstly utilizes a multi-hop selector to select the relevant utterances as context. Then, the model matches the filtered context with the candidate response and obtains a matching score. Experimental results show that MSN outperforms some state-of-the-art methods on three public multi-turn dialogue datasets.
Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent studies either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structured or unstructured knowledge bases which fails to take advantages of both sources simultaneously. In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence. Specifically, we extract evidence from both structured knowledge base (i.e. ConceptNet) and Wikipedia plain texts. We construct graphs for both sources to obtain the relational structures of evidence. Based on these graphs, we propose a graph-based approach consisting of a graph-based contextual word representation learning module and a graph-based inference module. The first module utilizes graph structural information to re-define the distance between words for learning better contextual word representations. The second module adopts graph convolutional network to encode neighbor information into the representations of nodes, and aggregates evidence with graph attention mechanism for predicting the final answer. Experimental results on CommonsenseQA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the CommonsenseQA dataset.
Scripts represent knowledge of event sequences that can help text understanding. Script event prediction requires to measure the relation between an existing chain and the subsequent event. The dominant approaches either focus on the effects of individual events, or the influence of the chain sequence. However, only considering individual events will lose much semantic relations within the event chain, and only considering the sequence of the chain will introduce much noise. With our observations, both the individual events and the event segments within the chain can facilitate the prediction of the subsequent event. This paper develops self attention mechanism to focus on diverse event segments within the chain and the event chain is represented as a set of event segments. We utilize the event-level attention to model the relations between subsequent events and individual events. Then, we propose the chain-level attention to model the relations between subsequent events and event segments within the chain. Finally, we integrate event-level and chain-level attentions to interact with the chain to predict what happens next. Comprehensive experiment results on the widely used New York Times corpus demonstrate that our model achieves better results than other state-of-the-art baselines by adopting the evaluation of Multi-Choice Narrative Cloze task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.