The availability of large documentsummary corpora have opened up new possibilities for using statistical text generation techniques for abstractive summarization. Progress in Extractive text summarization has become stagnant for a while now and in this work we compare the two possible alternates to it. We present an argument in favor of abstractive summarization compared to an ensemble of extractive techniques. Further we explore the possibility of using statistical machine translation as a generative text summarization technique and present possible research questions in this direction. We also report our initial findings and future direction of research.
Motivation for proposed researchExtractive techniques of text summarization have long been the primary focus of research compared to abstractive techniques. But recent reports tend to suggest that advances in extractive text summarization have slowed down in the past few years (Nenkova and McKeown, 2012). Only marginal improvements are being reported over previous techniques, and more often than not these seem to be a result of variation in the parameters used during evaluation using ROUGE, and some times due to other factors like a better redundancy removal module (generally used after the sentences are ranked according to their importance) rather than the actual algorithm. Overall it seems that the current state of the art techniques for extractive summarization have more or less achieved their peak performance and only some small improvements can be further achieved. In such a scenario there seem to be two possible directions of further research. One approach that could be used is making an ensemble of these techniques which might prove to be better than the individual methods. The other option is to focus on abstractive techniques instead.A large number of extractive summarization techniques have been developed in the past decade especially after the advent of conferences like Document Understanding Conference (DUC) 1 and Text Analysis Conference (TAC) 2 . But very few inquiries have been made as to how these differ from each other and what are the salient features on some which are absent in others. (Hong et al., 2014) is first such attempt to compare summaries beyond merely comparing the ROUGE (Lin, 2004) scores. They show that many systems, although having a similar ROUGE score indeed have very different content and have little overlap among themselves. This difference, at least theoretically, opens up a possibility of combining these summaries at various levels, like fusing rank lists (Wang and Li, 2012), choosing the best combination of sentences from several summaries (Hong et al., 2015) or using learning-torank techniques to generate rank lists of sentences and then choosing the top-k sentences as a summary, to get a better result. In the next section we report our initial experiments and show that a meaningful ensemble of these techniques can help in improving the coverage of existing techniques. But such a scenario is not always guaranteed...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.