Proceedings of the 2nd International Conference on Computing and Big Data 2019
DOI: 10.1145/3366650.3366661
|View full text |Cite
|
Sign up to set email alerts
|

Multi-class Document Classification Using Improved Word Embeddings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
2
2
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…While the total vocabulary with Word2Vec is around 15.8 K, fastText has only 4.7 K sub-words. Also, it only shows a 0.5%-1% improvement, as given by Benedict et al [56]. Therefore, we reverted to the older Word2Vec approach for pre-training the WE model as it is easier to transfer the embedding matrix weights between pre-trained and actual models.…”
Section: Discussionmentioning
confidence: 99%
“…While the total vocabulary with Word2Vec is around 15.8 K, fastText has only 4.7 K sub-words. Also, it only shows a 0.5%-1% improvement, as given by Benedict et al [56]. Therefore, we reverted to the older Word2Vec approach for pre-training the WE model as it is easier to transfer the embedding matrix weights between pre-trained and actual models.…”
Section: Discussionmentioning
confidence: 99%
“…The main objective of a document classification task is to allocate each item of a text corpus to one or more categories depending on whether it is a multiclass or a multilabel classification task [2]. Based on the training data, the system can classify previously unseen items to their corresponding categories.…”
Section: Related Work a Multi-class Document Classificationmentioning
confidence: 99%
“…[1] proposed a pairwise multiclass document classification approach for identifying relationships between Wikipedia articles, and SCVD-MS was presented in [6], which utilized multisense embeddings to improve multiclass classification on the 20NewsGroup dataset 1 while also targeting a lower dimensional representation compared to that of its predecessor SCDV. The 20NewsGroup dataset was also utilized in [2], in which an extension to the Word2Vec and FastText [12] word embedding algorithms is proposed. The word embeddings were augmented with semantic information by assigning a part-of-speech (POS) tag to each word with the objective of evaluating the enhanced model's performance on a multiclass classification task.…”
Section: Related Work a Multi-class Document Classificationmentioning
confidence: 99%
“…This method is particularly efficient at gathering semantic and contextual connections between words due to its training on a vast corpus of data. Pre-trained word embeddings can improve the accuracy of learning models when there is consistency between the data domain and the corpus used for training (ALRashdi and O'Keefe, 2019;Asudani et al, 2023;Rabut et al, 2019). Conversely, custom-trained embedding is solely trained using specified datasets (Sabbeh and Fasihuddin, 2023).…”
Section: Introductionmentioning
confidence: 99%