This paper contains a large literature review in the research field of Text Summarisation (TS) based on Human Language Technologies (HLT). TS helps users manage the vast amount of information available, by condensing documents' content and extracting the most relevant facts or topics included in them. The rapid development of emerging technologies poses new challenges to this research field, which still need to be solved. Therefore, it is essential to analyse its progress over the years, and provide an overview of the past, present and future directions, highlighting the main advances achieved and outlining remaining limitations. With this purpose, several important aspects are addressed within the scope of this survey. On the one hand, the paper aims at giving a general perspective on the state-ofthe-art, describing the main concepts, as well as different summarisation approaches, and relevant international forums. Furthermore, it is important to stress upon the fact that the birth of new requirements and scenarios has led to new types of summaries with specific purposes (e.g. sentiment-based summaries), and novel domains within which TS has proven to be also suitable for (e.g. blogs). In addition, TS is successfully combined with a number of intelligent systems based on HLT (e.g. information retrieval, question answering, and text classification). On the other hand, a deep study of the evaluation of summaries is also conducted in this paper, where the existing methodologies and systems are explained, as well as new research that has emerged concerning the automatic evaluation of summaries' quality. Finally, some thoughts about TS in general and its future will encourage the reader to think of novel approaches, applications and lines to conduct research in the next years. The analysis of these issues allows the reader to have a wide and useful background on the main important aspects of this research field.
Evaluation is crucial in the research and development of automatic summarization applications, in order to determine the appropriateness of a summary based on different criteria, such as the content it contains, and the way it is presented. To perform an adequate evaluation is of great relevance to ensure that automatic summaries can be useful for the context and/or application they are generated for. To this end, researchers must be aware of the evaluation metrics, approaches, and datasets that are available, in order to decide which of them would be the most suitable to use, or to be able to propose new ones, overcoming the possible limitations that existing methods may present. In this article, a critical and historical analysis of evaluation metrics, methods, and datasets for automatic summarization systems is presented, where the strengths and weaknesses of evaluation efforts are discussed and the major challenges to solve are identified. Therefore, a clear up-to-date overview of the evolution and progress of summarization evaluation is provided, giving the reader useful insights into the past, present and latest trends in the automatic evaluation of summaries.
A new approach to narrative abstractive summarization (NATSUM) is presented in this paper. NATSUM is centered on generating a narrative chronologically ordered summary about a target entity from several news documents related to the same topic. To achieve this, first, our system creates a cross-document timeline where a time point contains all the event mentions that refer to the same event. This timeline is enriched with all the arguments of the events that are extracted from different documents. Secondly, using natural language generation techniques, one sentence for each event is produced using the arguments involved in the event. Specifically, a hybrid surface realization approach is used, based on over-generation and ranking techniques. The evaluation demonstrates that NATSUM performed better than extractive summarization approaches and competitive abstractive baselines, improving the F1-measure at least by 50%, when a real scenario is simulated.
The Web 2.0 has resulted in a shift as to how users consume and interact with the information, and has introduced a wide range of new textual genres, such as reviews or microblogs, through which users comunicate, exchange, and share opinions. The explotation of all this user-generated content is of great value both for users and companies, in order to assist them in their decision-making processes. Given this context, the analysis and development of automatic methods that can help manage online information in a quicker manner are needed.Therefore, this article proposes and evaluates a novel concept-level approach for ultra-concise opinion abstractive summarization. Our approach is characterized by the integration of syntactic sentence simplification, sentence regeneration and internal concept representation into the summarization process, thus being able to generate abstractive summaries, which is one the most challenging issues for this task. In order to be able to analyze different settings for our approach, the use of the sentence regeneration module was made optional, leading to two different versions of the system (one with sentence regeneration and one without). For testing them, a corpus of 400 English texts, gathered from reviews and tweets belonging to two different domains, was used. Although both versions were shown to be reliable methods for generating this type of summaries, the results obtained indicate that the version without sentence regeneration yielded to better results, improving the results of a number of stateof-the-art systems by 9%, whereas the version with sentence regeneration proved to be more robust to noisy data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.