2021
DOI: 10.2478/jdis-2021-0024
|View full text |Cite
|
Sign up to set email alerts
|

Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering

Abstract: Purpose Detection of research fields or topics and understanding the dynamics help the scientific community in their decisions regarding the establishment of scientific fields. This also helps in having a better collaboration with governments and businesses. This study aims to investigate the development of research fields over time, translating it into a topic detection problem. Design/methodology/approach To achieve the obj… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 34 publications
0
1
0
Order By: Relevance
“…Doc2Vec [10] is a document representation of Word2Vec that takes the word order into account. Comparative studies by Radu et al (2020) [13] and Vahidnia et al (2021) [14] have shown that Doc2Vec models with off-the-shelf clustering algorithms such as K-means and DBSCAN [22] and deep embedded clustering [15] improve the accuracy of document clustering on scientific publications and outperform classical bag-of-words. However, for Word2Vec and Doc2Vec embeddings, only one vector is generated for a word, which fails to embed different senses of a word, for example, the word bank regardless of whether it is used in river bank and commercial bank.…”
Section: Pretrained Language Models and Applicationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Doc2Vec [10] is a document representation of Word2Vec that takes the word order into account. Comparative studies by Radu et al (2020) [13] and Vahidnia et al (2021) [14] have shown that Doc2Vec models with off-the-shelf clustering algorithms such as K-means and DBSCAN [22] and deep embedded clustering [15] improve the accuracy of document clustering on scientific publications and outperform classical bag-of-words. However, for Word2Vec and Doc2Vec embeddings, only one vector is generated for a word, which fails to embed different senses of a word, for example, the word bank regardless of whether it is used in river bank and commercial bank.…”
Section: Pretrained Language Models and Applicationsmentioning
confidence: 99%
“…There are a number of document embedding methods, from Bag-of-words (BoW), Word2Vec [9], and Doc2Vec [10] to the most recent, transformed-based such as BERT (Bidirectional Encoder Representations from Transformers) [11] and the GPT-3 similarity embeddings [12]. Radu et al (2020) [13] and Vahidnia et al (2021) [14] experimented with Doc2Vec embedding with off-the-shelf clustering algorithms, such as K-means, hierarchical agglomerative clustering, and deep embedded clustering [15], on publication abstracts and then used the TF-IDF terms to label each cluster. Their results showed that the use of Doc2Vec embedding improves the accuracy of clustering algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…The last paper, "Embedding-based Detection and Extraction of Research Topics from Academic Documents Using Deep Clustering" (Vahidnia, Abbasi, & Abbass, 2021) proposed a modified deep clustering method to detect research trends from the abstracts and titles of academic documents. The experimental results show that the modified DEC in conjunction with Doc2Vec can outperform other methods in the clustering task.…”
Section: Journal Of Data and Information Sciencementioning
confidence: 99%