The paper deals with the pilot version of the first RST discourse treebank for Russian. The project started in 2016. At present, the treebank consists of sixty news texts annotated for rhetorical relations according to RST scheme. However, this scheme was slightly modified in order to achieve higher inter-annotator agreement score. During the annotation procedure, we also registered the discourse connectives of different types and mapped them onto the corresponding rhetoric relations. In present paper, we discuss our experience of RST scheme adaptation for Russian news texts. Besides, we report on the distribution of the most frequent discourse connectives in our corpus.
Standard image caption generation systems produce generic descriptions of images and do not utilize any contextual information or world knowledge. In particular, they are unable to generate captions that contain references to the geographic context of an image, for example, the location where a photograph is taken or relevant geographic objects around an image location. In this paper, we develop a geo-aware image caption generation system, which incorporates geographic contextual information into a standard image captioning pipeline. We propose a way to build an image-specific representation of the geographic context and adapt the caption generation network to produce appropriate geographic names in the image descriptions. We evaluate our system on a novel captioning dataset that contains contextualized captions and geographic metadata and achieve substantial improvements in BLEU, ROUGE, METEOR and CIDEr scores. We also introduce a new metric to assess generated geographic references directly and empirically demonstrate our system's ability to produce captions with relevant and factually accurate geographic referencing.
This dissertation is dedicated to image captioning, the task of automatically generating a natural language description of a given image. Most modern automatic caption generators are trained to produce a straightforward visual description of what can be directly seen in the image. By contrast, a human-written caption may also include information that cannot be inferred from the image alone: references to image-external world knowledge. Exploring ways to enrich automatic image captioning with contextually relevant external knowledge is the main focus of this dissertation. The general approach we develop begins with the identification and extraction of relevant external knowledge. This task is carried out by a contextualization anchor, an element of image-related data that is used to determine which part of the world knowledge available in external resources would be useful for captioning a given image. Through the contextualization anchor, we identify real world entities that are relevant to the image, which make up an entity context. We further retrieve various facts about these entities, creating an informative knowledge context. We integrate both entity and knowledge contexts into a neural encoder-decoder captioning pipeline as extra sources of information for generating the caption. The goal of the resulting “knowledge-aware” captioning model is to generate captions that are influenced by the relevant external knowledge and possibly include explicit references to it. During evaluation, we pay special attention to measuring factual accuracy, the veridicality of image-external knowledge in the automatically generated captions. Based on this approach, we develop three image captioning models. Their training data, which includes two new datasets we compile, contains naturally produced captions with abundant references to external knowledge. The first model focuses on geographic knowledge in particular. It uses image location metadata as a contextualization anchor to identify geographic entities in and around the image. These entities make up the geographic entity context, which provides extra input for the encoder and an additional vocabulary for the decoder, allowing it to generate entity names in the captions. The evaluation shows a substantial improvement over the standard baseline models, particularly in the ability to correctly produce specific geographic references. The second model additionally includes the knowledge context, which consists of diverse encyclopedic facts about the relevant entities. It is used as another input to the encoder, and in the decoder it provides extra contextualization for the generation of regular words and another vocabulary for generating fact-related tokens. In our experiments, this model confidently outperforms various baseline models in standard captioning metrics and, importantly, in the accuracy of the generated facts. The third model extends beyond the geographic domain and applies our approach to the qualitatively different data: images from newspaper articles. Here, the article itself is used as a contextualization anchor, the entity context is constructed from named entities of various types (not only geographic), collected from the article text, and the knowledge context includes encyclopedic facts about these entities. The resulting model is able to generate contextualized captions that incorporate information from both the article and an external knowledge base.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.