Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Short Pa 2008
DOI: 10.3115/1557690.1557768
|View full text |Cite
|
Sign up to set email alerts
|

Text segmentation with LDA-based Fisher kernel

Abstract: In this paper we propose a domainindependent text segmentation method, which consists of three components. Latent Dirichlet allocation (LDA) is employed to compute words semantic distribution, and we measure semantic similarity by the Fisher kernel. Finally global best segmentation is achieved by dynamic programming. Experiments on Chinese data sets with the technique show it can be effective. Introducing latent semantic information, our algorithm is robust on irregular-sized segments.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
26
0

Year Published

2010
2010
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(26 citation statements)
references
References 10 publications
0
26
0
Order By: Relevance
“…Latent Dirichlet Allocation (LDA) [3] can also be used for computing word relatedness by representing words as vectors of probabilities over each topic. Sun et al [23] used LDA in a related application of text segmentation, using a Fisher kernel.…”
Section: Related Workmentioning
confidence: 99%
“…Latent Dirichlet Allocation (LDA) [3] can also be used for computing word relatedness by representing words as vectors of probabilities over each topic. Sun et al [23] used LDA in a related application of text segmentation, using a Fisher kernel.…”
Section: Related Workmentioning
confidence: 99%
“…In the domain of text segmentation, the work in Sun et al (2008) used an LDA-based Fisher kernel to measure text semantic similarity between blocks of documents in the form of latent semantic topics that were previously inferred using LDA. The kernel is controlled by the number of shared semantics and word co-occurrences.…”
Section: Related Workmentioning
confidence: 99%
“…One of the first probabilistic algorithms has been introduced by Utiyama and Isahara (2001). LDA based approaches were first described by Sun et al (2008) and improved by Misra et al (2009). The newest LDA based segmenter is TT.…”
Section: Related Workmentioning
confidence: 99%