2017
DOI: 10.1109/taslp.2016.2626965
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast News

Abstract: This paper studies a strategy to model latent topics and temporal distance of text blocks for story segmentation, that we call Graph Regularization in Topic Modeling or GRTM. We propose two novel approaches that consider both temporal distance and lexical similarity of text blocks, collectively referred to as data proximity, in learning latent topic representation, where a graph regularizer is involved to derive the latent topic representation while preserving data proximity. In the first approach, we extend t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 43 publications
(60 reference statements)
0
4
0
Order By: Relevance
“…LSI has been widely discussed in study [8,11,29]. pLSI is an enhancement of LSI which able to model every word as a representation of some topics that could overcome the problem of synonyms and polysemy [4,14,18,31]. However, modeling which was done by pLSI only at document level.…”
Section: Introductionmentioning
confidence: 99%
“…LSI has been widely discussed in study [8,11,29]. pLSI is an enhancement of LSI which able to model every word as a representation of some topics that could overcome the problem of synonyms and polysemy [4,14,18,31]. However, modeling which was done by pLSI only at document level.…”
Section: Introductionmentioning
confidence: 99%
“…This feature vector is then classified into its respective class label via a machine learning algorithm [10,[15][16][17][18]. Since BoW features are highly sparse and lack diversity [19], topic modeling approaches such as latent Dirichlet allocation (LDA) [20] have been developed. These admixture approaches were originally employed for document classification as they offer linguistic insights into language patterns by grouping associated words into topics and, thereafter, computing the probabilities of topics occurring in each document [21,22].…”
Section: Motivationmentioning
confidence: 99%
“…The scatter plot in Figure 6.3(c), on the other hand, exhibits a less skewed set of points based on the scaling in (6. 19) and is reflected via the histogram with a lighter-tailed distribution. The above implies that moderate emphasis is given to less significant topic probabilities which is subsequently shown to result in better feature representation for classification.…”
Section: Chapter Summarymentioning
confidence: 99%
“…SPIGA [83] represents documents by doing EDL and constructing a weighted bag-of-concepts with these entities. Also, graph based models like and-or-graphs (AOG) [120] or graph regularization methods [32] have recently been proving high accuracy results and are useful for multimodal topic modeling. For example, in [120] a novel representation using a Multimodal Topic And-Or Graph (MT-AOG) is presented.…”
Section: Topic Modelingmentioning
confidence: 99%