Vibhu O. Mittal scite author profile

Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. A more practical approach to this problem results in the use of an approximation: viewing summarization as a problem analogous to statistical machine translation. The issue then becomes one of generating a target document in a more concise language from a source document in a more verbose language. This paper presents results on experiments using this approach, in which statistical models of the term selection and term ordering are jointly applied to produce summaries in a style learned from a training corpus.

show abstract

Multi-document summarization by sentence extraction

Goldstein

Mittal²,

Carbonell

et al. 2000

217

126

View full text Add to dashboard Cite

This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available information on about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Our approach addresses these issues by using domainindependent techniques based mainly on fast, statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages, and a modular framework to allow easy parameterization for different genres, corpora characteristics and user requirements.

show abstract

Multi-document summarization by sentence extraction

Goldstein

Mittal²,

Carbonell

et al. 2000

View full text Add to dashboard Cite

This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available in-, formation about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Our approach addresses these issues by using domainindependent techniques based mainly on fast, statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages, and a modular framework to allow easy parameterization for different genres, corpora characteristics and user requirements.

show abstract

Creating and evaluating multi-document sentence extract summaries

Goldstein

Mittal

Carbonell

et al. 2000

View full text Add to dashboard Cite

This paper discusses passage extraction approaches to multidocument summarization that use available information about the document set as a whole and the relationships between the documents to build on single document summarization methodology. Multi-document summarization di ers from single in that the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries, as well as the user's goals in creating the summary. Our approach addresses these issues by using domain-independent techniques based mainly on fast, statistical processing, a metric for reducing redundancy and maximizing diversity in the selected passages, and a modular framework to allow easy parameterization for di erent genres, corpora characteristics and user requirements. We examined how h umans create multi-document summaries as well as the characteristics of such summaries and use these summaries to evaluate the performance of various multidocument summarization algorithms.

show abstract

Applying Machine Learning for High‐Performance Named‐Entity Extraction

Baluja

Mittal

Sukthankar

2000

Computational Intelligence

View full text Add to dashboard Cite

This paper describes a machine learning approach to building an efficient and accurate name spotting system. Finding names in free text is an important task in many text-based applications. Most previous approaches were based on hand-crafted modules encoding language and genre-specific knowledge. These approaches had at least two shortcomings: they required large amounts of time and expertise to develop and were not easily portable to new languages and genres. This paper describes an extensible system that automatically combines weak evidence from different, easily available sources: parts-of-speech tags, dictionaries, and surface-level syntactic information such as capitalization and punctuation. Individually, each piece of evidence is insufficient for robust name detection. However, the combination of evidence, through standard machine learning techniques, yields a system that achieves performance equivalent to the best existing hand-crafted approaches.

show abstract

Query-relevant summarization using FAQs

Berger

Mittal²

2000

View full text Add to dashboard Cite

This paper introduces a statistical model for query-relevant summarization: succinctly characterizing the relevance of a document to a query. Learning parameter values for the proposed model requires a large collection of summarized documents, which we do not have, but as a proxy, we use a collection of FAQ (frequently-asked question) documents. Taking a learning approach enables a principled, quantitative evaluation of the proposed system, and the results of some initial experiments-on a collection of Usenet FAQs and on a FAQ-like set of customer-submitted questions to several large retail companies-suggest the plausibility of learning for summarization.

show abstract

Stemming and its effects on TFIDF ranking (poster session)

Kantrowitz¹,

Mohit

Mittal

2000

View full text Add to dashboard Cite

The Happy Searcher: Challenges in Web Information Retrieval

Sahami

Mittal

Baluja

et al. 2004

View full text Add to dashboard Cite

Search has arguably become the dominant paradigm for finding information on the World Wide Web. In order to build a successful search engine, there are a number of challenges that arise where techniques from artificial intelligence can be used to have a significant impact. In this paper, we explore a number of problems related to finding information on the web and discuss approaches that have been employed in various research programs, including some of those at Google. Specifically, we examine issues of such as web graph analysis, statistical methods for inferring meaning in text, and the retrieval and analysis of newsgroup postings, images, and sounds. We show that leveraging the vast amounts of data on web, it is possible to successfully address problems in innovative ways that vastly improve on standard, but often data impoverished, methods. We also present a number of open research problems to help spur further research in these areas.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vibhu O. Mittal

Headline generation based on statistical translation

Multi-document summarization by sentence extraction

Multi-document summarization by sentence extraction

Creating and evaluating multi-document sentence extract summaries

Applying Machine Learning for High‐Performance Named‐Entity Extraction

Query-relevant summarization using FAQs

Stemming and its effects on TFIDF ranking (poster session)

The Happy Searcher: Challenges in Web Information Retrieval

Contact Info

Product

Resources

About