Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

Nastiti, Kartika Rizqi; Hidayatullah, Ahmad Fathan; Pratama, Ahmad R.

doi:10.15575/join.v6i1.636

Cited by 5 publications

(4 citation statements)

References 20 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Below is an array containing 30 sample data entries that represent the overall outcome of preprocessing: ['accelerating', 'academiaedu', 'analysing', 'apa', 'citation', 'cite', 'chicago', 'downloaded', 'formation', 'get', 'health', 'international', 'mla', 'patterns', 'public', 'pulmonary', 'rainfall', 'research', 'related', 'spatial', 'visual', 'hendra', 'rohman', 'science', 'paper', 'styles', 'tuberculosis', 'papers', 'world'] C. Exploratory Data Analysis (EDA) [36]: Calculating Coherence Score In this phase, the selection of the number of topics is based on the coherence score. The processing weighting of word analysis using Term Frequency-Inverse Document Frequency technique to reduce unnecessary words, vocabulary and eliminating noise [22]. which indicates the model's capacity to present data in a comprehensible manner for humans.…”

Section: B Data Preprocessingmentioning

confidence: 99%

“…By adopting a KDF, we aim to not only assess the effectiveness of various topic modeling in digital health, but also to reveal nuanced insights that may have otherwise remained hidden. As we embark on this intellectual journey, our goal is to contribute to the refinement of methodologies in digital health research, offering a nuanced understanding of the strengths and limitations of different topic modeling approaches [22]. Through this KDF and comparative analysis, we aspire to empower researchers, practitioners, and stakeholders in the digital health domain with the knowledge needed to make informed decisions and drive advancements in the field.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unveiling Insights: A Knowledge Discovery Approach to Comparing Topic Modeling Techniques in Digital Health Research

Rohajawati,

Rahayu,

Misky

et al. 2024

intensif

View full text Add to dashboard Cite

This paper introduces a knowledge discovery approach focused on comparing topic modeling techniques within the realm of digital health research. Knowledge discovery has been applied in massive data repositories (databases) and also in various field studies, which use these techniques for finding patterns in the data, determining which models and parameters might be suitable, and looking for patterns of interest in a specific representational. Unfortunately, the investigation delves into the utilization of Latent Dirichlet Allocation (LDA) and Pachinko Allocation Models (PAM) as generative probabilistic models in knowledge discovery, which is still limited. The study's findings position PAM as the superior technique, showcasing the greatest number of distinctive tokens per topic and the fastest processing time. Notably, PAM identifies 87 unique tokens across 10 topics, surpassing LDA Gensim's identification of only 27 unique tokens. Furthermore, PAM demonstrates remarkable efficiency by swiftly processing 404 documents within an incredibly short span of 0.000118970870 seconds, in contrast to LDA Gensim's considerably longer processing time of 0.368770837783 seconds. Ultimately, PAM emerges as the optimum method for digital health research's topic modeling, boasting unmatched efficiency in analyzing extensive digital health text data.

show abstract

Section: B Data Preprocessingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Unveiling Insights: A Knowledge Discovery Approach to Comparing Topic Modeling Techniques in Digital Health Research

Rohajawati,

Rahayu,

Misky

et al. 2024

intensif

View full text Add to dashboard Cite

show abstract

“…This process can generally be divided into two key stages: First, establishing a comprehensive set of emotional words is undertaken. Second, semantic proximity between emotion feature words is assessed, often employing techniques such as Similarity Calculation or Bootstrapping algorithms to derive an emotion semantic model [13,14].…”

Section: Topic Emotion Modellingmentioning

confidence: 99%

An Emotional Analysis of Korean Topics based on Social Media Big Data Clustering

Jin

2024

SCPE

View full text Add to dashboard Cite

An innovative approach is introduced in this paper to address the challenges in emotional topic interpretation and accuracy in emotional situation assessment. Utilizing large data from social media to improve the accuracy of emotional analysis in online debates, with a specific emphasis on Korean themes. The proposed solution, the Online Topic Emotion Recognition Model (OTSRM), builds upon the foundational Online Latent Dirichlet Allocation (OLDA) model. The OTSRM integrates the concept of emotion intensity and introduces an inventive emotion iteration framework to tackle these issues. Key innovations of the OTSRM include establishing an affective evolution channel by augmenting affective heritability using a β priori. Additionally, the model generates two critical distribution matrices: one for characteristic words and another for affective words, facilitating a deeper understanding of emotional context within topics. The relative entropy method is employed to discern emotional tones in textual content, calculating maximum emotion values for topic focus within adjacent time segments. Validation experiments using five diverse network event datasets and comparisons to mainstream models demonstrate the OTSRM's effectiveness with emotion recognition accuracy rates of 85.56% and 81.03%. The OTSRM represents significant progress in addressing challenges associated with emotional topic analysis and precise emotional dynamics assessment in Korean social media data.

show abstract

“…In line with the purpose of using the method in this study, latent dirichlet allocation (LDA) [10] is one of the most widely used unsupervised learning methods in topic modelling to help determine the topic of a text through hidden topics from a document. [11,12]. In previous hotel review research [7], LDA is used to extract aspect and opinion terms from hotel reviews for the ABSA proses process.…”

Section: Introductionmentioning

confidence: 99%

Aspect-Based Sentiment Analysis for Hotel Review Using LDA, Semantic Similarity, and BERT

2022

IJIES

View full text Add to dashboard Cite

Hotel review is frequently used as a main input in sentiment analysis process. It aims at helping travellers easily find more accurate information about hotel aspects in selecting hotels for their journeys. Based on the review datasets, hotel service organizers may evaluate guests' responses towards the services provided by the hotels. Hotel organizers, in turn, may also know the hotel aspects which need improvement for the next experiences. The common problems are because the processed data do not focus on small scale so that wrong selection of terms from review document frequently appears. The problem that often arises is that the amount of data that is processed is not limited to a small scale, so there are often errors in taking terms from a review document. Meanwhile, these terms are the main input source used for the assessment of aspect categorization and aspect-based sentiment analysis. So, we need aspect categorization and aspect-based sentiment analysis methods that can work automatically on a large scale with good accuracy results. In this study, first, the results of the pre-processing were processed using TF-ICF to obtain terms from reviews based on aspect keyword variables in each hotel aspect category. Next, LDA was used to get the hidden topic of each term. The aim was to obtain better terms accuracy results. Then, the aspect categorization process was carried out using BERT embedding and semantic similarity with the aim of obtaining more significant differences in similarity results in each aspect category so that the determination of aspect categories from a review could be more accurate. The results of the aspect extraction obtained an evaluation of the aspect categorization for each precision 0.86, recall 0.92, and f1-measure 0.89. Furthermore, BERT sentiment analysis method is used in the aspect-based sentiment analysis process. Finally, the evaluation result of aspect-based sentiment analysis obtained for each precision, recall, and f-1 measure are 0.96, 0.98, and 0.97.

show abstract

Discovering Computer Science Research Topic Trends using Latent Dirichlet Allocation

Cited by 5 publications

References 20 publications

Unveiling Insights: A Knowledge Discovery Approach to Comparing Topic Modeling Techniques in Digital Health Research

Unveiling Insights: A Knowledge Discovery Approach to Comparing Topic Modeling Techniques in Digital Health Research

An Emotional Analysis of Korean Topics based on Social Media Big Data Clustering

Aspect-Based Sentiment Analysis for Hotel Review Using LDA, Semantic Similarity, and BERT

Contact Info

Product

Resources

About