A Comprehensive Analysis Dashboard for Detecting Similar Saudi Twitter Accounts by Using Stylometric Features

Authors: Lamar Suliman Abd Aljaleel

Company/Institution: Fakieh School, Ministry of Education of Makkah Saudi Arabia, Highly Innovative Unique Foundation (HiUF), Kingdom of Saudi Arabia

Country: Kingdom of Saudi Arabia

The research tackles the problem of detecting similar Saudi Twitter accounts used by criminals to spread harmful ideologies while concealing their true beliefs. Existing methods, focused on English texts and content-dependent features, are inadequate for Arabic and fail to account for the variability in topics across accounts from the same author. This study proposes an innovative model that emphasizes content-independent factors, incorporating stylometric features to analyze writing style rather than content. The advantage of this approach is its ability to detect similar accounts across different topics, making it more adaptable to the Arabic language and harder for criminals to evade detection by simply changing their content. The model evaluates several machine learning (ML) classifiers like Random Forest (RF) and XGBoost, as well as deep learning models such as TabNet and CNN, using a dataset of Saudi Twitter accounts involved in terrorism and racism. Results show that ML models, particularly RF and XGBoost, outperform deep learning models, with the inclusion of stylometric features improving accuracy. The research contributes by developing a unique dataset, introducing new stylometric features, comparing different models, and creating a dashboard for analyzing similar accounts, offering a more effective solution for detecting harmful activity in Arabic Twitter.