DETECTING CRIMINAL CONTENT ON SOCIAL MEDIA USING MACHINE LEARNING MODELS: A CASE STUDY ON KAZAKH-LANGUAGE CONTENT

Gulshat Baispay; Shynar Mussuraliyeva

Authors

Gulshat Baispay Senior Lecturer, Al-Farabi Kazakh National University, Almaty, Kazakhstan
Shynar Mussuraliyeva PhD in Physics and Mathematics, Professor, Al-Farabi Kazakh National University, Almaty, Kazakhstan

Keywords:

cybersecurity, online social networks, crime detection, deep learning, neural networks

Abstract

With the increasing prevalence of harmful content on social media, there is a growing need for automated systems capable of detecting criminal discourse online. This study focuses on the detection of crime-related content in Kazakh-language social media posts using machine learning and natural language processing (NLP) techniques. A multilingual corpus was compiled from social networks, annotated into seven categories: Noncrime, Assault, Burglary, Drugs, Homicide, Sex Offense, and Extremist. Both classical machine learning classifiers (e.g., Logistic Regression, Naive Bayes, Random Forest) and deep learning models were trained and evaluated using various text vectorization methods (TF-IDF, Word2Vec, CountVectorizer). Among traditional models, Logistic Regression achieved the highest performance with an F1-score of 0.9681. BERT, used as the primary deep learning model, demonstrated strong capability in identifying nuanced criminal content, especially in under-resourced languages like Kazakh. The study underscores the effectiveness of modern NLP techniques for multilingual crime detection and contributes valuable resources for future research on content moderation in low-resource linguistic environments

DETECTING CRIMINAL CONTENT ON SOCIAL MEDIA USING MACHINE LEARNING MODELS: A CASE STUDY ON KAZAKH-LANGUAGE CONTENT

Authors

Keywords:

Abstract

Published

How to Cite

Issue

Section

License