2020
DOI: 10.20944/preprints202012.0600.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards Bengali Word Embedding: Corpus Creation, Intrinsic and Extrinsic Evaluations

Abstract: Distributional word vector representation orword embedding has become an essential ingredient in many natural language processing (NLP) tasks such as machine translation, document classification, information retrieval andquestion answering. Investigation of embedding model helps to reduce the feature space and improves textual semantic as well as syntactic relations.This paper presents three embedding techniques (such as Word2Vec, GloVe, and FastText) with different hyperparameters implemented on a Bengali cor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 18 publications
0
3
0
Order By: Relevance
“…The experiment was performed on 180 million Bengali words. Extrinsic performance provides better results than intrinsic performance, as discussed in a previous study [2]. A Bengali text document classifier was developed using GloVe embedding and a very deep convolution neural network (VDCNN).…”
Section: Introductionmentioning
confidence: 89%
“…The experiment was performed on 180 million Bengali words. Extrinsic performance provides better results than intrinsic performance, as discussed in a previous study [2]. A Bengali text document classifier was developed using GloVe embedding and a very deep convolution neural network (VDCNN).…”
Section: Introductionmentioning
confidence: 89%
“…One notable application of text classification is evaluating word vector representations trained on foreign languages, a domain where standardized intrinsic procedures are yet to be established. For instance, Hossain et al [117] utilized text classification to evaluate embeddings derived from a Bengali corpus encompassing 180 million word tokens. The authors developed their classification model using Convolutional Neural Network (CNN) architecture.…”
Section: Domain-knowledge Extrinsic Evaluation: Concepts and Model Ar...mentioning
confidence: 99%
“…The proposed authorship classification system is evaluated in three ways: embedding model evaluation, training phase evaluation and testing phase evaluation. Embedding model evaluation is refers to the quality judgement of feature vectors which is an essential tasks for the low-resource languages [60]. The intrinsic and extrinsic evaluations are used for evaluating the embedding model.…”
Section: A Evaluation Measuresmentioning
confidence: 99%
“…Various combinations of hyperparameters of three embedding techniques (e.g., GloVe, FastText and Word2Vec) have generated 90 local contextual embedding models [18 for GloVe, 36 for FastText (Skip-gram CBOW), 36 for Word2Vec (Skip-gram CBOW)]. Intrinsic evaluators are used to evaluate a total of 90 models using syntactic and semantic similarity measures [60]. Based on the intrinsic evaluation performance, a total of 9 top-performing embedding models are selected to perform the downstream task (i.e., authorship classification).…”
Section: A Embedding Models Evaluationmentioning
confidence: 99%