2012 IEEE Spoken Language Technology Workshop (SLT) 2012
DOI: 10.1109/slt.2012.6424226
|View full text |Cite
|
Sign up to set email alerts
|

Modeling multiword phrases with constrained phrase trees for improved topic modeling of conversational speech

Abstract: Latent topic modeling has proven to be an effective means for learning the underlying semantic content within document collections. Latent topic modeling has traditionally been applied to bagof-words representations that ignore word sequence information that can aid in semantic understanding. In this work we introduce a method for efficiently incorporating arbitrarily long word sequences into a topic modeling approach. This method iteratively constructs a constrained set of phrase trees in an unsupervised fash… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
4

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 12 publications
(6 reference statements)
0
3
0
Order By: Relevance
“…Its value ranges between 0 and 1, with 1 representing a perfect mapping between the true topics and the latent topics. Table 3 shows the NMI scores for a uniform random assignment of documents to latent topics, the hard agglomerative clustering used for initialization, both latent models, and a phrase-based PLSA model applied to text transcripts of the data [10]. Both models do a surprisingly good job of learning latent topics with a strong mapping to the true topics, given the fully unsupervised nature of the system.…”
Section: Methodsmentioning
confidence: 99%
“…Its value ranges between 0 and 1, with 1 representing a perfect mapping between the true topics and the latent topics. Table 3 shows the NMI scores for a uniform random assignment of documents to latent topics, the hard agglomerative clustering used for initialization, both latent models, and a phrase-based PLSA model applied to text transcripts of the data [10]. Both models do a surprisingly good job of learning latent topics with a strong mapping to the true topics, given the fully unsupervised nature of the system.…”
Section: Methodsmentioning
confidence: 99%
“…Make a set of all keywords for both RRL and LRL. 4. Train the Convolutional Neural Network (CNN) neural network using audio waveforms in the RRL.…”
Section: Model Trainingmentioning
confidence: 99%
“…Topic detection is a heavily studied problem, including methods specialized for both text [2] and speech [3] sources. Topic detection and tracking from speech is most accurately performed when one can first perform automatic speech recognition (ASR), then apply text-oriented topic detection methods such as latent Dirichlet allocation [2] or partial semantic parse [4]. It has been demonstrated that ASR-based topic detection outperforms methods without transcription, even when the ASR output has a relatively high error rate [5,6].…”
Section: Introductionmentioning
confidence: 99%