He Jing scite author profile

We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation algorithm uses automatically induced decision rules to combine the different features. The embedded text-based algorithm builds on lexical cohesion and has performance comparable to state-of-the-art algorithms based on lexical information. A significant error reduction is obtained by combining the two knowledge sources. Related WorkExisting approaches to textual segmentation can be broadly divided into two categories. On the one hand, many algorithms exploit the fact that topic segments tend to be lexically cohesive. Embodiments of this idea include semantic similarity (Morris and Hirst, 1991;Kozima, 1993), cosine similarity

show abstract

Named entity recognition through classifier combination

Florian

et al. 2003

View full text Add to dashboard Cite

This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classifiers (robust linear classifier, maximum entropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined system attains a performance of 91.6F on the English development data; integrating name, location and person gazetteers, and named entity systems trained on additional, more general, data reduces the F-measure error by a factor of 15 to 21% on the English data.

show abstract

Centroid-based summarization of multiple documents

Radev

Jing

Styś

et al. 2004

Information Processing & Management

772

151

View full text Add to dashboard Cite

Centroid-based summarization of multiple documents

2000

View full text Add to dashboard Cite

We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

show abstract

Sentence reduction for automatic text summarization

Jing¹

2000

196

135

View full text Add to dashboard Cite

We present a novel sentence reduction system for automatically removing extraneous phrases from sentences that are extracted from a document for summarization purpose. The system uses multiple sources of knowledge to decide which phrases in an extracted sentence can be removed, including syntactic knowledge, context information, and statistics computed from a corpus which consists of examples written by human professionals. Reduction can significantly improve the conciseness of automatic summaries.

show abstract

A mention-synchronous coreference resolution algorithm based on the Bell tree

et al. 2004

View full text Add to dashboard Cite

This paper proposes a new approach for coreference resolution which uses the Bell tree to represent the search space and casts the coreference resolution problem as finding the best path from the root of the Bell tree to the leaf nodes. A Maximum Entropy model is used to rank these paths. The coreference performance on the 2002 and 2003 Automatic Content Extraction (ACE) data will be reported. We also train a coreference system using the MUC6 data and competitive results are obtained.

show abstract

The decomposition of human-written summary sentences

1999

View full text Add to dashboard Cite

We define the problem of decomposing human-written summary sentences and propose a novel Hidden Markov Model solution to the problem. Human summarizers often rely on cutting and pasting of the full document to generate summaries. Decomposing a human-written summary sentence requires determining: (1) whether it is constructed by cutting and pasting, (2) what components in the sentence come from the original document, and (3) where in the document the components come from. Solving the decomposition problem can potentially lead to the automatic acquisition of large corpora for summarization. It also sheds light on the generation of summary text by cutting and pasting. The evaluation shows that the proposed decomposition algorithm performs well.

show abstract

A Statistical Model for Multilingual Entity Detection and Tracking

Florian¹,

Hassan²,

Ittycheriah³

et al. 2004

110

View full text Add to dashboard Cite

Entity detection and tracking is a relatively new addition to the repertoire of natural language tasks. In this paper, we present a statistical language-independent framework for identifying and tracking named, nominal and pronominal references to entities within unrestricted text documents, and chaining them into clusters corresponding to each logical entity present in the text. Both the mention detection model and the novel entity tracking model can use arbitrary feature types, being able to integrate a wide array of lexical, syntactic and semantic features. In addition, the mention detection model crucially uses feature streams derived from different named entity classifiers. The proposed framework is evaluated with several experiments run in Arabic, Chinese and English texts; a system based on the approach described here and submitted to the latest Automatic Content Extraction (ACE) evaluation achieved top-tier results in all three evaluation languages.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

He Jing

Discourse segmentation of multi-party conversation

Named entity recognition through classifier combination

Centroid-based summarization of multiple documents

Centroid-based summarization of multiple documents

Sentence reduction for automatic text summarization

A mention-synchronous coreference resolution algorithm based on the Bell tree

The decomposition of human-written summary sentences

A Statistical Model for Multilingual Entity Detection and Tracking

Contact Info

Product

Resources

About