Text segmentation with LDA-based Fisher kernel

Sun, Qi; Li, Runxin; Luo, Dingsheng; Wu, Xihong

doi:10.3115/1557690.1557768

Cited by 26 publications

(26 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Latent Dirichlet Allocation (LDA) [3] can also be used for computing word relatedness by representing words as vectors of probabilities over each topic. Sun et al [23] used LDA in a related application of text segmentation, using a Fisher kernel.…”

Section: Related Workmentioning

confidence: 99%

Large-scale learning of word relatedness with constraints

Halawi

Dror²,

Gabrilovich

et al. 2012

Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

201

161

View full text Add to dashboard Cite

Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process. Our method, called CLEAR, is shown to significantly outperform previously published approaches. The proposed method is based on first principles, and is generic enough to exploit diverse types of text corpora, while having the flexibility to impose constraints on the derived word similarities. We also make publicly available a new labeled dataset for evaluating word relatedness algorithms, which we believe to be the largest such dataset to date.

show abstract

Section: Related Workmentioning

confidence: 99%

Large-scale learning of word relatedness with constraints

Halawi

Dror²,

Gabrilovich

et al. 2012

Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

201

161

View full text Add to dashboard Cite

show abstract

“…In the domain of text segmentation, the work in Sun et al (2008) used an LDA-based Fisher kernel to measure text semantic similarity between blocks of documents in the form of latent semantic topics that were previously inferred using LDA. The kernel is controlled by the number of shared semantics and word co-occurrences.…”

Section: Related Workmentioning

confidence: 99%

Embedding Semantics in LDA Topic Models

et al. 2010

View full text Add to dashboard Cite

“…One of the first probabilistic algorithms has been introduced by Utiyama and Isahara (2001). LDA based approaches were first described by Sun et al (2008) and improved by Misra et al (2009). The newest LDA based segmenter is TT.…”

Section: Related Workmentioning

confidence: 99%

Using Text Segmentation Algorithms for the Automatic Generation of E-Learning Courses

Ozmen¹,

Streicher²,

Zielinski

2014

Proceedings of the Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)

View full text Add to dashboard Cite

With the advent of e-learning, there is a strong demand for tools that help to create e-learning courses in an automatic or semi-automatic way. While resources for new courses are often freely available, they are generally not properly structured into easy to handle units. In this paper, we investigate how state of the art text segmentation algorithms can be applied to automatically transform unstructured text into coherent pieces appropriate for e-learning courses. The feasibility to course generation is validated on a test corpus specifically tailored to this scenario. We also introduce a more generic training and testing method for text segmentation algorithms based on a Latent Dirichlet Allocation (LDA) topic model. In addition we introduce a scalable random text segmentation algorithm, in order to establish lower and upper bounds to be able to evaluate segmentation results on a common basis.

show abstract

Text segmentation with LDA-based Fisher kernel

Cited by 26 publications

References 10 publications

Large-scale learning of word relatedness with constraints

Large-scale learning of word relatedness with constraints

Embedding Semantics in LDA Topic Models

Using Text Segmentation Algorithms for the Automatic Generation of E-Learning Courses

Contact Info

Product

Resources

About