2021
DOI: 10.15575/join.v6i1.636
|View full text |Cite
|
Sign up to set email alerts
|

Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

Abstract: Before conducting a research project, researchers must find the trends and state of the art in their research field. However, that is not necessarily an easy job for researchers, partly due to the lack of specific tools to filter the required information by time range. This study aims to provide a solution to that problem by performing a topic modeling approach to the scraped data from Google Scholar between 2010 and 2019. We utilized Latent Dirichlet Allocation (LDA) combined with Term Frequency-Indexed Docum… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 20 publications
(20 reference statements)
0
3
0
Order By: Relevance
“…Below is an array containing 30 sample data entries that represent the overall outcome of preprocessing: ['accelerating', 'academiaedu', 'analysing', 'apa', 'citation', 'cite', 'chicago', 'downloaded', 'formation', 'get', 'health', 'international', 'mla', 'patterns', 'public', 'pulmonary', 'rainfall', 'research', 'related', 'spatial', 'visual', 'hendra', 'rohman', 'science', 'paper', 'styles', 'tuberculosis', 'papers', 'world'] C. Exploratory Data Analysis (EDA) [36]: Calculating Coherence Score In this phase, the selection of the number of topics is based on the coherence score. The processing weighting of word analysis using Term Frequency-Inverse Document Frequency technique to reduce unnecessary words, vocabulary and eliminating noise [22]. which indicates the model's capacity to present data in a comprehensible manner for humans.…”
Section: B Data Preprocessingmentioning
confidence: 99%
See 1 more Smart Citation
“…Below is an array containing 30 sample data entries that represent the overall outcome of preprocessing: ['accelerating', 'academiaedu', 'analysing', 'apa', 'citation', 'cite', 'chicago', 'downloaded', 'formation', 'get', 'health', 'international', 'mla', 'patterns', 'public', 'pulmonary', 'rainfall', 'research', 'related', 'spatial', 'visual', 'hendra', 'rohman', 'science', 'paper', 'styles', 'tuberculosis', 'papers', 'world'] C. Exploratory Data Analysis (EDA) [36]: Calculating Coherence Score In this phase, the selection of the number of topics is based on the coherence score. The processing weighting of word analysis using Term Frequency-Inverse Document Frequency technique to reduce unnecessary words, vocabulary and eliminating noise [22]. which indicates the model's capacity to present data in a comprehensible manner for humans.…”
Section: B Data Preprocessingmentioning
confidence: 99%
“…By adopting a KDF, we aim to not only assess the effectiveness of various topic modeling in digital health, but also to reveal nuanced insights that may have otherwise remained hidden. As we embark on this intellectual journey, our goal is to contribute to the refinement of methodologies in digital health research, offering a nuanced understanding of the strengths and limitations of different topic modeling approaches [22]. Through this KDF and comparative analysis, we aspire to empower researchers, practitioners, and stakeholders in the digital health domain with the knowledge needed to make informed decisions and drive advancements in the field.…”
Section: Introductionmentioning
confidence: 99%
“…This process can generally be divided into two key stages: First, establishing a comprehensive set of emotional words is undertaken. Second, semantic proximity between emotion feature words is assessed, often employing techniques such as Similarity Calculation or Bootstrapping algorithms to derive an emotion semantic model [13,14].…”
Section: Topic Emotion Modellingmentioning
confidence: 99%
“…In line with the purpose of using the method in this study, latent dirichlet allocation (LDA) [10] is one of the most widely used unsupervised learning methods in topic modelling to help determine the topic of a text through hidden topics from a document. [11,12]. In previous hotel review research [7], LDA is used to extract aspect and opinion terms from hotel reviews for the ABSA proses process.…”
Section: Introductionmentioning
confidence: 99%