2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA) 2017
DOI: 10.1109/dsaa.2017.61
|View full text |Cite
|
Sign up to set email alerts
|

Full-Text or Abstract? Examining Topic Coherence Scores Using Latent Dirichlet Allocation

Abstract: This paper assesses topic coherence and human topic ranking of uncovered latent topics from scientific publications when utilizing the topic model latent Dirichlet allocation (LDA) on abstract and full-text data. The coherence of a topic, used as a proxy for topic quality, is based on the distributional hypothesis that states that words with similar meaning tend to co-occur within a similar context. Although LDA has gained much attention from machine-learning researchers, most notably with its adaptations and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
172
0
11

Year Published

2018
2018
2023
2023

Publication Types

Select...
8
1

Relationship

4
5

Authors

Journals

citations
Cited by 266 publications
(192 citation statements)
references
References 28 publications
1
172
0
11
Order By: Relevance
“…To uncover latent topics, the topic model method latent Dirichlet allocation (LDA) (Blei, ; Blei, Ng, & Jordan, ) was used. All pre‐processing steps to suitably prepare documents for statistical topical inference (Hoffman, Blei, & Bach, ) are described in our previous work (Syed et al, ), which are highly optimised for the fisheries domain (Syed & Spruit, , , ). With LDA, the number of topics needs to be specified in advance, analogous to most unsupervised methods such as k‐means clustering or Gaussian mixture models.…”
Section: Methodsmentioning
confidence: 99%
“…To uncover latent topics, the topic model method latent Dirichlet allocation (LDA) (Blei, ; Blei, Ng, & Jordan, ) was used. All pre‐processing steps to suitably prepare documents for statistical topical inference (Hoffman, Blei, & Bach, ) are described in our previous work (Syed et al, ), which are highly optimised for the fisheries domain (Syed & Spruit, , , ). With LDA, the number of topics needs to be specified in advance, analogous to most unsupervised methods such as k‐means clustering or Gaussian mixture models.…”
Section: Methodsmentioning
confidence: 99%
“…We downloaded full‐text research articles published in the 21 journals covering fisheries aspects for a time span of 26 years (1990–2016) to allow for enough variation in publication trends. Analysing full‐text articles, compared to just abstract data, results in more detailed and higher quality topics (Syed & Spruit, ). Only research articles were considered, and other types of publications, such as errata, conference reports, forewords, announcements, dedications, letters, comments, and book reviews, were excluded.…”
Section: Methodsmentioning
confidence: 99%
“…Analysing full-text articles, compared to just abstract data, results in more detailed and higher quality topics (Syed & Spruit, 2017).…”
Section: Creating the Data Setmentioning
confidence: 99%
“…In total, 22,236 full-text research articles from 13 top-tier fisheries journals were downloaded using automated download scripts, as well as by utilizing the available application programming interfaces (APIs) offered by the publishers. The use of full-text articles, in contrast to only using abstracts, has shown to increase topic quality and provide a more detailed overview of the latent topics permeating a document collection (Syed and Spruit, 2017). Table 1 provides an overview of the complete dataset utilized in this study.…”
Section: Creating the Datasetmentioning
confidence: 99%