2018
DOI: 10.1080/19312458.2018.1430754
|View full text |Cite
|
Sign up to set email alerts
|

Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology

Abstract: Latent Dirichlet allocation (LDA) topic models are increasingly being used in communication research. Yet, questions regarding reliability and validity of the approach have received little attention thus far. In applying LDA to textual data, researchers need to tackle at least four major challenges that affect these criteria: (a) appropriate pre-processing of the text collection; (b) adequate selection of model parameters, including the number of topics to be generated; (c) evaluation of the model's reliabilit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
356
0
7

Year Published

2018
2018
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 533 publications
(418 citation statements)
references
References 48 publications
3
356
0
7
Order By: Relevance
“…Furthermore, the models also reveal the proportion of each topic in the documents; a document belongs to all topics, but a topic-perdocument probability reveals its proportion: these posterior values range from 0 to 1 so that, in total, per document, they sum to 1. In topic modelling, pre-processing unstructured text data is crucial to increasing reliability and making a valid interpretation of the topic (Maier et al, 2018). We proceeded by lemmatizing the content, removing stop-words and, lastly, applying relative pruning to remove both rare and very frequent words (Denny and Spirling, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, the models also reveal the proportion of each topic in the documents; a document belongs to all topics, but a topic-perdocument probability reveals its proportion: these posterior values range from 0 to 1 so that, in total, per document, they sum to 1. In topic modelling, pre-processing unstructured text data is crucial to increasing reliability and making a valid interpretation of the topic (Maier et al, 2018). We proceeded by lemmatizing the content, removing stop-words and, lastly, applying relative pruning to remove both rare and very frequent words (Denny and Spirling, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…In order to characterise the type of communication activity which political candidates engage in during the campaign period on Twitter, we make use of a series of topic models, a technique which is increasingly used in communication research (Maier et al, 2018). The approach allows us to extract a discrete number of general topics from the textual data within candidate tweets.…”
Section: Appendixmentioning
confidence: 99%
“…To draw adequate conclusions, the interpretation of the latent variables must be substantially validated [8]. Several authors proposed guidance for evaluation and validating LDA models [9]. We studied the coherence and perplexity of different LDA resulted models, choose the model that had the best coherence value, filtered the articles written on software complexity starting from design patterns discovered topics and conducted the analysis of resulted papers by reviewing their subject and method of research.…”
Section: Fig 2 Lda Research Approachmentioning
confidence: 99%