A Dynamic Programming Algorithm for Linear Text Segmentation

Fragkou, Pavlina; Petridis, V.; Kehagias, Athanasios

doi:10.1023/b:jiis.0000039534.65423.00

Cited by 64 publications

(48 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The studies using text were pioneered by the TextTiling approach [2], where adjacent sentence blocks were compared using a similarity measure based on bag-of-words (BOW) features, such as term frequency -inverted document frequency (tf-idf). Later studies indicated that globally optimized segmentation methodssuch as dynamic programming (DP) and the hidden Markov model (HMM) [3,4,14] -can improve the performance, and usage of probabilistic topic modeling such as probabilistic latent semantic analysis (pLSA) [15,7] and latent Dirichlet allocation (LDA) [16,17] can further increase the accuracy. Analogous to approaches used in automatic speech recognition (ASR), deep neural networks have been combined with HMMs (DNN-HMM) and successfully applied to story segmentation, using BOW features of text data, with significant improvement in performance [18].…”

Section: Introductionmentioning

confidence: 99%

Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features

Tsunoo

Klejch

Bell

et al. 2017

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

A broadcast news stream consists of a number of stories and it is an important task to find the boundaries of stories automatically in news analysis. We capture the topic structure using a hierarchical model based on a Recurrent Neural Network (RNN) sentence modeling layer and a bidirectional Long Short-Term Memory (LSTM) topic modeling layer, with a fusion of acoustic and lexical features. Both features are accumulated with RNNs and trained jointly within the model to be fused at the sentence level. We conduct experiments on the topic detection and tracking (TDT4) task comparing combinations of two modalities trained with limited amount of parallel data. Further we utilize additional sufficient text data for training to polish our model. Experimental results indicate that the hierarchical RNN topic modeling takes advantage of the fusion scheme, especially with additional text training data, with a higher F1-measure compared to conventional state-of-the-art methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features

Tsunoo

Klejch

Bell

et al. 2017

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

View full text Add to dashboard Cite

show abstract

“…Choi (2000) introduced the probabilistic algorithm using matrix-based ranking and clustering to determine similarities between segments. Galley et al (2003) combined contentbased information with acoustic cues in order to detect discourse shifts whereas Utiyama and Isahara (2001) and Fragkou et al (2004) minimized different segmentation cost functions with dynamic programming.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Text Segmentation Using Semantic Relatedness Graphs

Glavaš¹,

Nanni²,

Ponzetto³

2016

Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

View full text Add to dashboard Cite

Segmenting text into semantically coherent fragments improves readability of text and facilitates tasks like text summarization and passage retrieval. In this paper, we present a novel unsupervised algorithm for linear text segmentation (TS) that exploits word embeddings and a measure of semantic relatedness of short texts to construct a semantic relatedness graph of the document. Semantically coherent segments are then derived from maximal cliques of the relatedness graph. The algorithm performs competitively on a standard synthetic dataset and outperforms the best-performing method on a real-world (i.e., non-artificial) dataset of political manifestos.

show abstract

“…An extensive discussion of precisely the same problem addressed here, but with a different approach to its solution, is in [3], [4]. Work by Hubert [10], [11], with applications to meteorology, influenced Kehagias and co-workers [8], [15], [16], [17], [18], [19], who developed a dynamic programming algorithm much like ours, for applications such as text segmentation (see also [9]), where the raw data are provided in the form of a similarity matrix. [22] gives an O(kN 2 ) dynamic programming algorithm for finding the optimal partition of an interval into k blocks, for a given k. See also [20], [2] for related work.…”

Section: Introduction: the Problemmentioning

confidence: 99%

An algorithm for optimal partitioning of data on an interval

Jackson

Scargle

Barnes

et al. 2005

IEEE Signal Process. Lett.

331

307

View full text Add to dashboard Cite

Abstract-Many signal processing problems can be solved by maximizing the fitness of a segmented model over all possible partitions of the data interval. This letter describes a simple but powerful algorithm that searches the exponentially large space of partitions of N data points in time O(N 2 ). The algorithm is guaranteed to find the exact global optimum, automatically determines the model order (the number of segments), has a convenient real-time mode, can be extended to higher dimensional data spaces, and solves a surprising variety of problems in signal detection and characterization, density estimation, cluster analysis and classification.

show abstract

A Dynamic Programming Algorithm for Linear Text Segmentation

Cited by 64 publications

References 22 publications

Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features

Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features

Unsupervised Text Segmentation Using Semantic Relatedness Graphs

An algorithm for optimal partitioning of data on an interval

Contact Info

Product

Resources

About