Enhancing diversity, coverage and balance for summarization through structure learning

Li, Liangda; Zhou, Ke; Xue, Gui-Rong; Zha, Hongyuan; Yu, Yong

doi:10.1145/1526709.1526720

Cited by 99 publications

(80 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…DivRank (Mei et al, 2010) is a generic graph ranking model that aims to balance high information coverage and low redundancy in top ranking vertices, which are also two key requirements for choosing salient summarization sentences (Li et al, 2009;Liu et al, 2015). Based on that, we present a model to rank and select salient messages from leader set V L to form a summary.…”

Section: Basic-leadsum Modelmentioning

confidence: 99%

Using Content-level Structures for Summarizing Microblog Repost Trees

Gao²,

Wei³

et al. 2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of events on microblogging sites, we propose a novel repost tree summarization framework by effectively differentiating two kinds of messages on repost trees called leaders and followers, which are derived from contentlevel structure information, i.e., contents of messages and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect leaders across repost tree paths. We then present a variant of random-walk-based summarization model to rank and select salient messages based on the result of leader detection. To reduce the error propagation cascaded from leader detection, we improve the framework by enhancing the random walk with adjustment steps for sampling from leader probabilities given all the reposting messages. For evaluation, we construct two annotated corpora, one for leader detection, and the other for repost tree summarization. Experimental results confirm the effectiveness of our method.

show abstract

Section: Basic-leadsum Modelmentioning

confidence: 99%

Using Content-level Structures for Summarizing Microblog Repost Trees

Gao²,

Wei³

et al. 2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Subtopic coverage [29], max-marginal relevance (MMR) [4] and submodular coverage [17,16] are examples of this paradigm where the marginal utility is designed by hand. SVMdiv [28] and IndStrSVM [15] learn the marginal utility of subtopic coverage of documents from training data.…”

Section: Prior Artmentioning

confidence: 99%

“…In the learning-to-rank literature, Yue and Joachims [28] proposed a structured learning framework SVMdiv for diverse topic coverage, by using features that capture word coverage signals as surrogates of topic coverage. IndStrSVM [15] propose additional constraints to encourage diversity and balance appropriate for the specific application of summarization. SVMdiv and IndStrSVM stand out as among very few diversity approaches that learn from a powerful hypothesis space.…”

Section: Subtopic Coveragementioning

confidence: 99%

Diversity in ranking via resistive graph centers

Dubey

Chakrabarti

Bhattacharyya³

2011

Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Users can rarely reveal their information need in full detail to a search engine within 1-2 words, so search engines need to "hedge their bets" and present diverse results within the precious 10 response slots. Diversity in ranking is of much recent interest. Most existing solutions estimate the marginal utility of an item given a set of items already in the response, and then use variants of greedy set cover. Others design graphs with the items as nodes and choose diverse items based on visit rates (PageRank). Here we introduce a radically new and natural formulation of diversity as finding centers in resistive graphs. Unlike in PageRank, we do not specify the edge resistances (equivalently, conductances) and ask for node visit rates. Instead, we look for a sparse set of center nodes so that the effective conductance from the center to the rest of the graph has maximum entropy. We give a cogent semantic justification for turning PageRankthus on its head. In marked deviation from prior work, our edge resistances are learnt from training data. Inference and learning are NP-hard, but we give practical solutions. In extensive experiments with subtopic retrieval, social network search, and document summarization, our approach convincingly surpasses recently-published diversity algorithms like subtopic cover, max-marginal relevance (MMR), Grasshopper, DivRank, and SVMdiv.

show abstract

“…Although this is challenging even with modern natural language processing techniques, a combination of techniques has proven to be effective, e.g. [9,20], and offers an approximation for the amount of similarity and thus redundancy between two sentences.…”

Section: Measuring Redundancy Via Semantic Similaritymentioning

confidence: 99%

Redundancy and Collaboration in Wikibooks

Liccardi

Chapuis

Yeung

et al. 2011

Human-Computer Interaction – INTERACT 2011

View full text Add to dashboard Cite

Abstract. This paper investigates how Wikibooks authors collaborate to create high-quality books. We combined Information Retrieval and statistical techniques to examine the complete multi-year lifecycle of over 50 high-quality Wikibooks. We found that: 1. The presence of redundant material is negatively correlated with collaboration mechanisms; 2. For most books, over 50% of the content is written by a small core of authors; and 3. Use of collaborative tools (predicted pages and talk pages) is significantly correlated with patterns of redundancy. Non-redundant books are well-planned from the beginning and require fewer talk pages to reach high-quality status. Initially redundant books begin with high redundancy, which drops as soon as authors use coordination tools to restructure the content. Suddenly redundant books display sudden bursts of redundancy that must be resolved, requiring significantly more discussion to reach high-quality status. These findings suggest that providing core authors with effective tools for visualizing and removing redundant material may increase writing speed and improve the book's ultimate quality.

show abstract

Enhancing diversity, coverage and balance for summarization through structure learning

Cited by 99 publications

References 22 publications

Using Content-level Structures for Summarizing Microblog Repost Trees

Using Content-level Structures for Summarizing Microblog Repost Trees

Diversity in ranking via resistive graph centers

Redundancy and Collaboration in Wikibooks

Contact Info

Product

Resources

About