Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-2075
|View full text |Cite
|
Sign up to set email alerts
|

Text Segmentation as a Supervised Learning Task

Abstract: Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as clustering or graph search, due to the paucity in labeled data. In this work, we formulate text segmentation as a supervised learning problem, and present a large new dataset for text segmentation that is automatically extracted and labeled from Wikipedia. Moreover, we develo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
155
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 106 publications
(191 citation statements)
references
References 10 publications
0
155
0
Order By: Relevance
“…More recent approaches (Alemi and Ginsparg, 2015; Glavaš et al, 2016) involve the use of semantic representations of words to compute sentence similarities. Koshorek et al (2018) and Badjatiya et al (2018) propose neural models to identify break points within the text. Sims et al (2019) address the slightly different, but relevant task of event prediction using a neural model, on a human-annotated dataset of short events.…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…More recent approaches (Alemi and Ginsparg, 2015; Glavaš et al, 2016) involve the use of semantic representations of words to compute sentence similarities. Koshorek et al (2018) and Badjatiya et al (2018) propose neural models to identify break points within the text. Sims et al (2019) address the slightly different, but relevant task of event prediction using a neural model, on a human-annotated dataset of short events.…”
Section: Previous Workmentioning
confidence: 99%
“…Our models outperform the baselines on all metrics, with the BERT (full window) model for break prediction model giving the best results. The approaches by Reynar (1994) and Utiyama and Isahara (2001), and the neural models proposed by Badjatiya et al (2018) and Koshorek et al (2018) are global models, and are prohibitively expensive on long documents.…”
Section: Algorithmmentioning
confidence: 99%
“…This task is often referred to as document segmentation or sometimes simply text segmentation. In Figure 1 we show one example of document segmentation from Wikipedia, on which the task is typically evaluated (Koshorek et al, 2018;Badjatiya et al, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…For example, document segmentation has been shown to improve information retrieval by indexing subdocument units instead of full documents (Llopis et al, 2002;Shtekh et al, 2018). Other applications such as summarization and information extraction can also benefit from text segmentation (Koshorek et al, 2018). The aim of document segmentation is breaking the raw text into a sequence of logically coherent sections (e.g., "Early life and marriage" and "Legacy" in our example).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation