Text classification has played a key role in various fields, such as news classification, spam detection, and sentiment analysis. However, the classification of crime news continues to pose challenges, including low efficiency, low precision, and the scarcity of high-quality annotated data on a large scale. Using pre-trained language models, such as Bidirectional Encoder Representation from Transformers (BERT), has reduced the need for extensive amounts of labelled data in the categorization process. BERT boasts strong abilities in contextual representation and excels in text classification tasks, particularly when limited labelled data is present. A BERT-based pre-trained language model was applied to categorize crimes using information gathered from Malaysian online newspapers to overcome the shortage of high-quality, large-scale crime-related labelled data. The crime-related labelled dataset used for training this model was compiled from BERNAMA (Malaysian National News Agency) and manually labelled by crime investigation experts into 12 categories, including a non-crime class. The experiment results showed that the BERT-based model outperformed previous models and achieved the highest performance with an accuracy of 99.45%. This highlights the efficacy of BERT in classifying crime news, even with a small dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.