Knowledge-grounded conversation models aim at generating informative responses for the given dialogue context, based on external knowledge. To generate an informative and context-coherent response, it is important to conjugate dialogue context and external knowledge in a balanced manner. However, existing studies have paid less attention to finding appropriate knowledge sentences from external knowledge sources than to generating proper sentences with correct dialogue acts. In this paper, we propose two knowledge selection strategies: 1) Reduce-Match and 2) Match-Reduce and explore several neural knowledge-grounded conversation models based on each strategy. Models based on Reduce-Match strategy first distill the whole dialogue context into a single vector with salient features preserved and then compare this context vector with the representation of knowledge sentences to predict a relevant knowledge sentence. Models based on Match-Reduce strategy first match every turn of the context with knowledge sentences to capture fine-grained interactions and aggregate them while minimizing information loss to predict the knowledge sentence. Experimental results show that conversation models using each of our knowledge selection strategies outperform the competitive baselines not only in terms of knowledge selection accuracy but also in response generation performance. Our best model based on Match-Reduce outperforms the baselines in the comparative studies with the Wizard of Wikipedia dataset. Also, our best model based on Reduce-Match outperforms them with the CMU Document Grounded Conversations dataset.
Although image-based CAPTCHAs have been introduced to overcome the security level limitation of the previous textbased CAPTCHAs, image-based CAPTCHAs still have problems such as user-unfriendliness in answer inference and wasted transmission cost. To cope with these issues, we propose a novel image-text fusion CAPTCHA model which uses a single image augmented with text hints that helps the users to guess the answers of the CAPTCHA problems more conveniently. According to the experiment results, the proposed CAPTCHA scheme has a higher correct answer rate than that of the previous scheme since the proposed scheme is able to help the users to infer the correct answer of the given CAPTCHA image more easily using the available text hints.
Users on the internet usually have conversations on interesting facts or topics along with diverse knowledge from the web. However, most existing knowledge-grounded conversation models consider only a single document regarding the topic of a conversation. The recently proposed retrievalaugmented models generate a response based on multiple documents; however, they ignore the given topic and use only the local context of the conversation. To this end, we introduce a novel retrieval-augmented response generation model that retrieves an appropriate range of documents relevant to both the topic and local context of a conversation and uses them for generating a knowledge-grounded response. Our model first accepts both topic words extracted from the whole conversation and the tokens before the response to yield multiple representations. It then chooses representations of the first N token and ones of keywords from the conversation and document encoders and compares the two groups of representation from the conversation with those groups of the document, respectively. For training, we introduce a new dataweighting scheme to encourage the model to produce knowledge-grounded responses without ground truth knowledge. Both automatic and human evaluation results with a large-scale dataset show that our models can generate more knowledgeable, diverse, and relevant responses compared to the state-of-the-art models.
The importance of semantic similarity measures between sentences is increasingly growing in text mining, text clustering, and question answering. Many studies have focused on finding exact term matching to predict sentence similarity. In this paper, we present a method for measuring sematic similarity of sentences based on constructed synonymy graph to avoid considering just exactly matching terms. When we construct graph which has terms as nodes and synonymy relation as edges, we use WordNet and part-of-speech to exploit synonyms. We assume synonym of a synonym is also similar; it takes advantage of the fact friend of a friend is likely to be a friend as well in real-world. With this concept, similarity between words is estimated by exploiting the minimum number of synonym chains between two nodes. The proposed algorithm calculates similarity of two sentences by summing all the similarities between selected words in sentences. Evaluation is conducted on two different data sets, Microsoft Research paraphrase corpus, and Yelp review dataset. Experimental evidences show that 1) the proposed method is more accurate compared to existing sentence similarity measures and 2) using real-world dataset like Yelp reveals that the proposed method has chance to be applied to recommendation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.