2011
DOI: 10.1016/j.ipm.2010.11.008
|View full text |Cite
|
Sign up to set email alerts
|

Text segmentation: A topic modeling perspective

Abstract: In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts. The proposed topic model based approaches consistently outperform a standard baseline method on several datasets. A major benefit of the proposed LDA based approach is that along with the segment boundaries, it outputs the topic distribution associ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
37
0
2

Year Published

2012
2012
2019
2019

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 55 publications
(39 citation statements)
references
References 25 publications
0
37
0
2
Order By: Relevance
“…This suggests an approach based on document passages. Some previous work on topic segmentation in a document [15,16,17] and passage-based retrieval [18,19] has validated this idea. Here we use it in translation model training.…”
Section: Position-aligned Translation Modelmentioning
confidence: 85%
“…This suggests an approach based on document passages. Some previous work on topic segmentation in a document [15,16,17] and passage-based retrieval [18,19] has validated this idea. Here we use it in translation model training.…”
Section: Position-aligned Translation Modelmentioning
confidence: 85%
“…The algorithm combines the segmentation probability model of Eisenstein with the non uniform prior on segmentations from (Utiyama & Isahara, 2001). Misra et al (Misra, Yvon, Cappe, & Jose, 2011) adopt a similar approach and use a segment prior similar to that of Utiyama, but consider segmentation probabilities based on latent Dirichlet allocation and multinomial mixture models. The Bayesian segmentation algorithm in Section 3 could be replaced with Misra's algorithm.…”
Section: Discussionmentioning
confidence: 99%
“…More recently, topic modeling (notably LDA) has been applied to discourse segmentation as well (e.g. Misra et al (2011); see also Riedl and Biemann (2012) for an overview). The dominant interest is on topical shifts in text as indicator of discourse structure, however topic modeling estimation is computationally expensive and needs domain-adaptation.…”
Section: Related Workmentioning
confidence: 99%