2016 19th International Conference on Computer and Information Technology (ICCIT) 2016
DOI: 10.1109/iccitechn.2016.7860236
|View full text |Cite
|
Sign up to set email alerts
|

Bengali word embeddings and it's application in solving document classification problem

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(10 citation statements)
references
References 7 publications
0
9
0
Order By: Relevance
“…In future work, we would like to exploit the local contexts and topic structure of the n-grams to identify the dispersion and would like to develop a supervised information-theoretic technique to efficiently aggregate the dispersed features in a preprocessing step to the supervise dimension reduction step. The local context and topic structure has been successfully exploited previously in text analysis in various ways [1,2,6,22,25,34]. Bringing awareness of these structures to information extraction would be a valuable addition to this pipeline, which uses the interdependence between the n-gram features and class-labels to obtain a low-dimensional discriminatory document representation.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In future work, we would like to exploit the local contexts and topic structure of the n-grams to identify the dispersion and would like to develop a supervised information-theoretic technique to efficiently aggregate the dispersed features in a preprocessing step to the supervise dimension reduction step. The local context and topic structure has been successfully exploited previously in text analysis in various ways [1,2,6,22,25,34]. Bringing awareness of these structures to information extraction would be a valuable addition to this pipeline, which uses the interdependence between the n-gram features and class-labels to obtain a low-dimensional discriminatory document representation.…”
Section: Discussionmentioning
confidence: 99%
“…When p > n or the explanatory variables are highly collinear then the sample covariance matrix Σ is singular. The precursor work [11] uses a ridge regularization to overcome the rank-deficiency by adding a diagonal matrix sI p to the sample covariance matrix in (1), where s is the ridge regularization parameter and I p is the identity matrix. Also the precursor work [11] uses a variant of the regularized SIR method, in particular they used the localized sliced inverse regression (LSIR) [33].…”
Section: Methodsmentioning
confidence: 99%
“…In another study, Amin et al [21] performed sentiment analysis on Bengali comments, which were analyzed using the Word2Vec approach and achieved 75.5% in each of the two classes. Ahmad et al [22] used Word2Vec for the Bengali document classification problem using Support Vector Machine (SVM) and obtained an F1-score of almost 91%.…”
Section: Bengali Word Embeddingmentioning
confidence: 99%
“…Word Embedding tools, technologies and pre-trained models are widely available for resource rich languages such as English (Mikolov et al, 2013;Pennington et al, 2014; and Chinese (Li et al, 2018;Chen et al, 2015). Due to the wide use of Word Embeddings, pre-trained models are increasingly available for resource poor languages such as Portuguese (Hartmann et al, 2017), Arabic (Elrazzaz et al, 2017;Soliman et al, 2017), and Bengali (Ahmad and Amin, 2016).…”
Section: Related Workmentioning
confidence: 99%