This paper proposes an end-to-end Natural Language Generation approach to automatically create fiction stories using statistical language models. The proposed approach integrates the stages of macroplanning and the surface realisation, necessary to determine the content to write about together with the structure of the story, and the syntactic and lexical realisation of sentences to be generated, respectively. Moreover, the use of language models within the stages allows the generation task to be more flexible, as far as the adaptation of the approach to different languages, domains and textual genres is concerned. In order to validate our approach, two evaluations were performed. On the one hand, the influence of integrating position-specific language modelling in the macroplanning stage into the surface realisation module was evaluated. On the other hand, a user evaluation was performed to analyse the generated stories in a qualitative manner. Although there is still room for improvement, the results obtained from the first evaluation in conjunction with the user evaluation feedback shows that the combination of the aforementioned stages in an end-to-end approach is appropriate and have positive effects in the resulting generated text.
Resumen. El ser humano se comunica y expresa a través del lenguaje. Para conseguirlo, ha de desarrollar una serie de habilidades de alto nivel cognitivo cuya complejidad se pone de manifiesto en la tarea de automatizar el proceso, tanto cuando se trata de producir lenguaje como de interpretarlo. Cuando la acción comunicativa ocurre entre una persona y un ordenador ý esteúltimo es el destinatario de la acción, se emplean lenguajes computacionales que, como norma general, están sujetos a un conjunto de reglas fuertemente tipadas, acotadas y sin ambigüedad. Sin embargo, cuando el sentido de la comunicación es el contrario y la máquina ha de transmitir información a la persona, si el mensaje se quiere transmitir en lenguaje natural, el procedimiento para generarlo debe lidiar con la flexibilidad y la ambigüedad que lo caracterizan, dando lugar a una tarea de alto nivel de complejidad. Para que las máquinas sean capaces de manejar el lenguaje humano se hacen necesarias técnicas de Lingüística Computacional. Dentro de esta disciplina, el campo que se encarga de crear textos en lenguaje natural se denomina Generación de Lenguaje Natural (GLN). En este artículo se va a hacer un recorrido exhaustivo de este campo. Se describen las fases en las que se suelen descomponer los sistemas de GLN junto a las técnicas que se aplican y se analiza con detalle la situación actual de estaárea de investigación y su problemática, así como los recursos más relevantes y las técnicas que se están empleando para evaluar la calidad de los sistemas.Palabras clave. Lingüística computacional, generación de lenguaje natural, GLN, fases, técnicas, evaluación. Natural Language Generation: Revision of the State of the ArtAbstract. Language is one of the highest cognitive skills developed by human beings and, therefore, one of the most complex tasks to be faced from the computational perspective. Human-computer communication processes imply two different degrees of difficulty depending on the nature of that communication. If the language used is oriented towards the domain of the machine, there is no place for ambiguity since it is restricted by rules. However, when the communication is in terms of natural language, its flexibility and ambiguity becomes unavoidable. Computational Linguistic techniques are mandatory for machines when it comes to process human language. Among them, the area of Natural Language Generation aims to automatical development of techniques to produce human utterances, text and speech. This paper presents a deep survey of this research area taking into account different points of view about the theories, methodologies, architectures, techniques and evaluation approaches, thus providing a review of the current situation and possible future research in the field. Keywords.Computational linguistics, natural language generation, NLG, stages, techniques, evaluation. IntroducciónLa Lingüística Computacional (LC) es un campo en el que convergen diversas disciplinas: la lingüística aplicada, la informática y la inteligencia artifici...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.