We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. BERT demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language model or the shallow concatenation of a forward and backward model. We retrofit BERT to conditional BERT by introducing a new conditional masked language model 1 task. The well trained conditional BERT can be applied to enhance contextual augmentation. Experiments on six various different text classification tasks show that our method can be easily applied to both convolutional or recurrent neural networks classifier to obtain obvious improvement.
Multi-turn retrieval-based conversation is an important task for building intelligent dialogue systems. Existing works mainly focus on matching candidate responses with every context utterance on multiple levels of granularity, which ignore the side effect of using excessive context information. Context utterances provide abundant information for extracting more matching features, but it also brings noise signals and unnecessary information.In this paper, we will analyze the side effect of using too many context utterances and propose a multi-hop selector network (MSN) to alleviate the problem. Specifically, MSN firstly utilizes a multi-hop selector to select the relevant utterances as context. Then, the model matches the filtered context with the candidate response and obtains a matching score. Experimental results show that MSN outperforms some state-of-the-art methods on three public multi-turn dialogue datasets.
Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent studies either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structured or unstructured knowledge bases which fails to take advantages of both sources simultaneously. In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence. Specifically, we extract evidence from both structured knowledge base (i.e. ConceptNet) and Wikipedia plain texts. We construct graphs for both sources to obtain the relational structures of evidence. Based on these graphs, we propose a graph-based approach consisting of a graph-based contextual word representation learning module and a graph-based inference module. The first module utilizes graph structural information to re-define the distance between words for learning better contextual word representations. The second module adopts graph convolutional network to encode neighbor information into the representations of nodes, and aggregates evidence with graph attention mechanism for predicting the final answer. Experimental results on CommonsenseQA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the CommonsenseQA dataset.
The development of social media has revolutionized the way people communicate, share information and make decisions, but it also provides an ideal platform for publishing and spreading rumors. Existing rumor detection methods focus on finding clues from text content, user profiles, and propagation patterns. However, the local semantic relation and global structural information in the message propagation graph have not been well utilized by previous works.In this paper, we present a novel global-local attention network (GLAN) for rumor detection, which jointly encodes the local semantic and global structural information. We first generate a better integrated representation for each source tweet by fusing the semantic information of related retweets with the attention mechanism. Then, we model the global relationships among all source tweets, retweets, and users as a heterogeneous graph to capture the rich structural information for rumor detection. We conduct experiments on three real-world datasets, and the results demonstrate that GLAN significantly outperforms the state-ofthe-art models in both rumor detection and early detection scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.