1999
DOI: 10.1023/a:1007506220214
|View full text |Cite
|
Sign up to set email alerts
|

Untitled

Abstract: Abstract. This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based on a technique that incrementally builds an exponential model to extract features that are correlated with the presence of boundaries in labeled training text. The models use two classes of features: topicality features that use adaptive language models in a novel way to detect broad changes of topic, and cue-word features that detect occurrences of specific words, which m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
31
0

Year Published

2006
2006
2018
2018

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 436 publications
(32 citation statements)
references
References 25 publications
1
31
0
Order By: Relevance
“…To begin with, we calculate the traditional mutual information entropy [21] for x and y as: . Here is the frequency that herb x and herb y occurred, and I(x, y,i) is the indicator function of x and y , showing whether herb x and y coexist in the formula i .…”
Section: Methodsmentioning
confidence: 99%
“…To begin with, we calculate the traditional mutual information entropy [21] for x and y as: . Here is the frequency that herb x and herb y occurred, and I(x, y,i) is the indicator function of x and y , showing whether herb x and y coexist in the formula i .…”
Section: Methodsmentioning
confidence: 99%
“…Bolzano (1) already noted the need for specific organization in scientific texts, while Ingarden devotes his book (2) to understanding the process by which a text is understood and assimilated. Modern methods (3,4) combine the work of linguists with those of computer scientists, physicists, physiologists, and researchers from many other fields to cover a wide range of texts, from the phoneme (5), going on to words (6-9, ʈ) and grammar (10,11), and all of the way to global text analysis (12) and the evolution of language (13,14).…”
mentioning
confidence: 99%
“…The performance of the algorithms applied in the annotated corpora was calculated using four widely known metrics: Precision, Recall, Beeferman's Pk [35] and WindowDiff. [36] For the segmentation task, Precision is defined as the proportion of boundaries chosen that agree with a reference segmentaPublished by Sciedu Press tion.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…However, both metrics suffer by the fact that they penalize near-misses of boundaries as full-misses, causing them to drastically overestimate the error. Beeferman's Pk [35] metric attempts to correct the erroneous calculation of penalties performed by Precision and Recall by computing penalties using a sliding a window of size k across the text, where k is defined as half of the mean reference segment size. Penalties are calculated by taking into account both the number of windows as well as whether boundaries appear in different segments in the reference and in the hypothesis segmentations for every window examined.…”
Section: Evaluation Metricsmentioning
confidence: 99%