Learning to Rank Semantic Coherence for Topic Segmentation

Wang, Liang; Li, Sujian; Lv, Yajuan; Wang, Houfeng

doi:10.18653/v1/d17-1139

Cited by 12 publications

(19 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For automatic text segmentation, multitude of approaches such as lexical overlap, bayesian learning or dynamic programming (Hearst, 1997;Choi, 2000;Utiyama and Isahara, 2001;Eisenstein and Barzilay, 2008;Du et al, 2013) have been proposed. The recent works rely on neural network models to learn different aspects of text segmentation such as coherence and cohesion (Wang et al, 2017;Sehikh et al, 2017;Bahdanau et al, 2016a;Arnold et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

Profiling News Discourse Structure Using Explicit Subtopic Structures Guided Critics

Choubey¹,

Huang²

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

We present an actor-critic framework to induce subtopical structures in a news article for news discourse profiling. The model uses multiple critics that act according to known subtopic structures while the actor aims to outperform them. The content structures constitute sentences that represent latent subtopic boundaries. Then, we introduce a hierarchical neural network that uses the identified subtopic boundary sentences to model multi-level interaction between sentences, subtopics, and the document. Experimental results and analyses on the NewsDiscourse corpus show that the actor model learns to effectively segment a document into subtopics and improves the performance of the hierarchical model on the news discourse profiling task 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

Profiling News Discourse Structure Using Explicit Subtopic Structures Guided Critics

Choubey¹,

Huang²

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

show abstract

“…Recently, proposed SegBot, a bidirectional RNN coupled with a pointer network that addresses both topic segmentation and EDU. Also, LSTM or CNN based approaches have been proposed, for instance through bidirectional layers (Sheikh et al, 2017), sentence embedding-based with four layers bidirectional LSTM (Koshorek et al, 2018) or through two symmetric CNN (Wang et al, 2017), etc. Finally, Arnold et al (2019) proposed Sector, the first LSTM-based architecture that combines topical (latent semantic content) and structural information (segmentation) as a mutual task.…”

Section: Related Workmentioning

confidence: 99%

“…On the contrary, if their similarity is below a certain threshold, a shift is determined (Hearst, 1997;Riedl and Biemann, 2012). When sufficient topically annotated training data are available, deep neural approaches based on CNN (Wang et al, 2017) or LSTM (Koshorek et al, 2018) can be efficiently applied Arnold et al, 2019). Until now, text segmentation methods have exclusively addressed data sets lying within the scope of narrative and expository texts or user dialogues texts and sometimes artificially generated data (Choi, 2000;Jeong and Titov, 2010;Glavaš et al, 2016;Koshorek et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Text Segmentation for Medieval Manuscripts

Hazem¹,

Daille²,

Stutzmann³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

In this paper, we address the segmentation of books of hours, Latin devotional manuscripts of the late Middle Ages, that exhibit challenging issues: a complex hierarchical entangled structure, variable content, noisy transcriptions with no sentence markers, and strong correlations between sections for which topical information is no longer sufficient to draw segmentation boundaries. We show that the main state-of-the-art segmentation methods are either inefficient or inapplicable for books of hours and propose a bottom-up greedy approach that considerably enhances the segmentation results. We stress the importance of such hierarchical segmentation of books of hours for historians to explore their overarching differences underlying conception about Church.

show abstract

“…In another line of research, Wang et al (2017) combined learning to rank and a convolutional neural network to learn a coherence function between text pairs; higher-ranked pairs are likely to be segments. Despite a promising approach, stateof-the-art results were not achieved.…”

Section: Related Workmentioning

confidence: 99%

BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification

Mota¹,

Eskénazi²,

Coheur³

2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

We propose BeamSeg, a joint model for segmentation and topic identification of documents from the same domain. The model assumes that lexical cohesion can be observed across documents, meaning that segments describing the same topic use a similar lexical distribution over the vocabulary. The model implements lexical cohesion in an unsupervised Bayesian setting by drawing from the same language model segments with the same topic. Contrary to previous approaches, we assume that language models are not independent, since the vocabulary changes in consecutive segments are expected to be smooth and not abrupt. We achieve this by using a dynamic Dirichlet prior that takes into account data contributions from other topics. BeamSeg also models segment length properties of documents based on modality (textbooks, slides, etc.). The evaluation is carried out in three datasets. In two of them, improvements of up to 4.8% and 7.3% are obtained in the segmentation and topic identifications tasks, indicating that both tasks should be jointly modeled.

show abstract

Learning to Rank Semantic Coherence for Topic Segmentation

Cited by 12 publications

References 15 publications

Profiling News Discourse Structure Using Explicit Subtopic Structures Guided Critics

Profiling News Discourse Structure Using Explicit Subtopic Structures Guided Critics

Hierarchical Text Segmentation for Medieval Manuscripts

BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification

Contact Info

Product

Resources

About