DETECTING CRIMINAL CONTENT ON SOCIAL MEDIA USING MACHINE LEARNING MODELS: A CASE STUDY ON KAZAKH-LANGUAGE CONTENT

Authors

  • Gulshat Baispay Senior Lecturer, Al-Farabi Kazakh National University, Almaty, Kazakhstan
  • Shynar Mussuraliyeva PhD in Physics and Mathematics, Professor, Al-Farabi Kazakh National University, Almaty, Kazakhstan

Keywords:

cybersecurity, online social networks, crime detection, deep learning, neural networks

Abstract

With the increasing prevalence of harmful content on social media, there is a growing need for automated systems capable of detecting criminal discourse online. This study focuses on the detection of crime-related content in Kazakh-language social media posts using machine learning and natural language processing (NLP) techniques. A multilingual corpus was compiled from social networks, annotated into seven categories: Noncrime, Assault, Burglary, Drugs, Homicide, Sex Offense, and Extremist. Both classical machine learning classifiers (e.g., Logistic Regression, Naive Bayes, Random Forest) and deep learning models were trained and evaluated using various text vectorization methods (TF-IDF, Word2Vec, CountVectorizer). Among traditional models, Logistic Regression achieved the highest performance with an F1-score of 0.9681. BERT, used as the primary deep learning model, demonstrated strong capability in identifying nuanced criminal content, especially in under-resourced languages like Kazakh. The study underscores the effectiveness of modern NLP techniques for multilingual crime detection and contributes valuable resources for future research on content moderation in low-resource linguistic environments

Published

2025-06-22

How to Cite

Gulshat Baispay, & Shynar Mussuraliyeva. (2025). DETECTING CRIMINAL CONTENT ON SOCIAL MEDIA USING MACHINE LEARNING MODELS: A CASE STUDY ON KAZAKH-LANGUAGE CONTENT. Scientific Research and Experimental Development, (10). Retrieved from https://ojs.publisher.agency/index.php/SRED/article/view/6557