Classification of Criminal News Over Time Using Bidirectional LSTM

Vidal, Mireya Tovar; Rodríguez, Emmanuel Santos; Reyes‐Ortiz, José A.

doi:10.1007/978-3-030-59830-3_61

Cited by 3 publications

(4 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The best accuracy was achieved by BERT based model with 99.18% on the first dataset. Other recent works as Deepak et al (2021) and Vidal et al (2020) [26,27] proposed a method for crime classification based on Bi-LSTM neural networks constructed for multi-label classification tasks to leverage these networks' capability either to remember long sentences or to forget the irrelevant context. The first work was trained and tested on four distinct datasets and got the best accuracy of 96.55%.…”

Section: Related Workmentioning

confidence: 99%

“…Many techniques have relied on recurrent neural networks (RNNs) and convolutional neural networks (CNNs), which are types of artificial neural networks that are well-suited for processing sequential or spatial data. These techniques have achieved reasonably high accuracy, but the work done by Vidal, Rodríguez [27] used the BERT model, which is a state-of-the-art model developed by transformers that achieved exceptionally high accuracy (99.18%) in Spanish. However, this model has not been tested on other languages, so it is unclear how well it would perform on English or multilingual text.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

A BERT-Based model: Improving Crime News Documents Classification through Adopting Pre-trained Language Models

Ali

Noah

Zakaria

2023

Preprint

View full text Add to dashboard Cite

Text classification has played a key role in various fields, such as news classification, spam detection, and sentiment analysis. However, the classification of crime news continues to pose challenges, including low efficiency, low precision, and the scarcity of high-quality annotated data on a large scale. Using pre-trained language models, such as Bidirectional Encoder Representation from Transformers (BERT), has reduced the need for extensive amounts of labelled data in the categorization process. BERT boasts strong abilities in contextual representation and excels in text classification tasks, particularly when limited labelled data is present. A BERT-based pre-trained language model was applied to categorize crimes using information gathered from Malaysian online newspapers to overcome the shortage of high-quality, large-scale crime-related labelled data. The crime-related labelled dataset used for training this model was compiled from BERNAMA (Malaysian National News Agency) and manually labelled by crime investigation experts into 12 categories, including a non-crime class. The experiment results showed that the BERT-based model outperformed previous models and achieved the highest performance with an accuracy of 99.45%. This highlights the efficacy of BERT in classifying crime news, even with a small dataset.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

A BERT-Based model: Improving Crime News Documents Classification through Adopting Pre-trained Language Models

Ali

Noah

Zakaria

2023

Preprint

View full text Add to dashboard Cite

show abstract

“…This has led to increasing development of deep learning-based methods also in text classification. They exploit many of the most known deep learning architectures, such as CNNs [34][35][36], RNNs [37,38], LSTMs [39][40][41] and the most recent Transformers [42,43]. Unlike conventional methods, they do not need designing rules and features by humans, since they automatically provide semantically meaningful representations.…”

Section: Literature Reviewmentioning

confidence: 99%

“…The classifiers with the best results are Support Vector Machine and Multinomial Naive Bayes which reach an F-measure around 80%. In [40] better results (98.87% of accuracy) are achieved by using LSTM to classify Spanish news texts deriving the text representation from a pre-trained Spanish Word2Vec model.…”

Section: Literature Reviewmentioning

confidence: 99%

Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset

Rollo

Bonisoli

2022

Lecture Notes in Business Information Processing

View full text Add to dashboard Cite

The automatic categorization of crime news is useful to create statistics on the type of crimes occurring in a certain area. This assignment can be treated as a text categorization problem. Several studies have shown that the use of word embeddings improves outcomes in many Natural Language Processing (NLP), including text categorization. The scope of this paper is to explore the use of word embeddings for Italian crime news text categorization. The approach followed is to compare different document pre-processing, Word2Vec models and methods to obtain word embeddings, including the extraction of bigrams and keyphrases. Then, supervised and unsupervised Machine Learning categorization algorithms have been applied and compared. In addition, the imbalance issue of the input dataset has been addressed by using Synthetic Minority Oversampling Technique (SMOTE) to oversample the elements in the minority classes. Experiments conducted on an Italian dataset of 17,500 crime news articles collected from 2011 till 2021 show very promising results. The supervised categorization has proven to be better than the unsupervised categorization, overcoming 80% both in precision and recall, reaching an accuracy of 0.86. Furthermore, lemmatization, bigrams and keyphrase extraction are not so decisive. In the end, the availability of our model on GitHub together with the code we used to extract word embeddings allows replicating our approach to other corpus either in Italian or other languages.

show abstract

A Comparative Analysis of Word Embeddings Techniques for Italian News Categorization

Rollo,

Bonisoli,

2024

IEEE Access

View full text Add to dashboard Cite

Classification of Criminal News Over Time Using Bidirectional LSTM

Cited by 3 publications

References 10 publications

A BERT-Based model: Improving Crime News Documents Classification through Adopting Pre-trained Language Models

A BERT-Based model: Improving Crime News Documents Classification through Adopting Pre-trained Language Models

Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset

A Comparative Analysis of Word Embeddings Techniques for Italian News Categorization

Contact Info

Product

Resources

About