Text classification has played a key role in various fields, such as news classification, spam detection, and sentiment analysis. However, the classification of crime news continues to pose challenges, including low efficiency, low precision, and the scarcity of high-quality annotated data on a large scale. Using pre-trained language models, such as Bidirectional Encoder Representation from Transformers (BERT), has reduced the need for extensive amounts of labelled data in the categorization process. BERT boasts strong abilities in contextual representation and excels in text classification tasks, particularly when limited labelled data is present. A BERT-based pre-trained language model was applied to categorize crimes using information gathered from Malaysian online newspapers to overcome the shortage of high-quality, large-scale crime-related labelled data. The crime-related labelled dataset used for training this model was compiled from BERNAMA (Malaysian National News Agency) and manually labelled by crime investigation experts into 12 categories, including a non-crime class. The experiment results showed that the BERT-based model outperformed previous models and achieved the highest performance with an accuracy of 99.45%. This highlights the efficacy of BERT in classifying crime news, even with a small dataset.