Named Entity Recognition (NER) is a task in Natural Language Processing (NLP) that aims to classify words into a predetermined list of Named Entities (NE). Many architectures have produced good results on high resourced languages like English and Chinese. However, the NER task has not yet achieved much progress for Bangla, a low resource Language. In this paper, we perform the NER task on Bangla Language using Word2Vec and contextual Bidirectional Encoder Representations from Transformers (BERT) embeddings. We propose multiple BERT-based deep learning models that use the contextualized embedding from BERT as inputs and a simple statistical approach for class weight cost sensitive learning. The modified cost-sensitive loss function was used to address the class imbalance of the data. In our modified cost-sensitive loss function, we penalize the dominant classes by taking the ratio concerning the maximum sample in a class instead of the whole dataset. This penalty is made so that the learner learns slowly for the dominant class. In addition, we experiment by adding a Conditional Random Field (CRF) layer and incorporating Focal Loss to the training process. We found the best F1 Macro score to be 65.96%, F1 Micro score of 90.64%, and F1 Message Understanding Coreference (MUC) score of 72.04%, which were calculated at Named Entity level. Our experimental results demonstrate that one of the proposed models, which jointly optimizes for the CRF loss and class weighted cost-sensitive loss according to our proposed statistical approach, achieve an improvement of over 8% F1 MUC score on a recently introduced Bangla NER dataset when compared to previously published work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.